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ABSTRACT 

Among the group of extrasolar planets, transiting planets provide a great opportunity 
to obtain direct measurements for the basic physical properties, such as mass and 
radius of these objects. These planets are therefore highly important in the under- 
standing of the evolution and formation of planetary systems; from the observations 
of photometric transits, the interior structure of the planet and atmospheric proper- 
ties can also be constrained. The most efficient way to search for transiting extrasolar 
planets is based on wide-field surveys by hunting for short and shallow periodic dips 
in light curves covering quite long time intervals. These surveys monitor fields with 
several degrees in diameter and tens or hundreds of thousands of objects simultane- 
ously. In the practice of astronomical observations, surveys of large field-of-view are 
rather new and therefore require special methods for photometric data reduction that 
have not been used before. Since 2004, I participate in the HATNet project, one of 
the leading initiatives in the competitive search for transiting planets. Due to the lack 
of software solution which is capable to handle and properly reduce the yield of such 
a wide-field survey, I have started to develop a new package designed to perform the 
related data processing and analysis. After several years of improvement, the software 
package became sufficiently robust and played a key role in the discovery of several 
transiting planets. In addition, various new algorithms for data reduction had to be 
developed, implemented and tested which were relevant during the reduction and the 
interpretation of data. 

In this PhD thesis, I summarize my efforts related to the development of a complete 
software solution for high precision photometric reduction of astronomical images. I 
also demonstrate the role of this newly developed package and the related algorithms 
in the case of particular discoveries of the HATNet project. 
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1 INTRODUCTION 

In the last two decades, the discovery and character- 
ization of extrasolar planets became an exciting field 
of astronomy. The first companion that was thought 
to be an object roughly 10 times more massive than 
Ear th, had been detected arou nd the pulsar PSR1829- 
10 (|Bailes. Lvne fc Shemailll99ll'). Although this detection 
turned out to be a false one (|Lvne fc Bailed I1992I ). shortly 
after the method of detecting planetary companions in- 
volving the analysis of pulsar timing variations led to the 
successful confirmatio n of the multiple planet ary system 
around PSR1257-M2 (|Wolszczan fc Fraill [l993 ) . The pio- 
neering discovery o f a planet orbitin g a ma in sequence star 
was announced by iMavor fc Queio3 (119951 ). They reported 
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the presence of a short-period planet orbiting the Sun-like 
star 51 Peg. This detection was based on precise radial veloc- 
ity measurements with uncertainties at the level of meter per 
second. Both discovery methods mentioned above are based 
on the fact that all components in a single or multiple plan- 
etary system, including the host star itself, revolve around 
the common barycenter, that is the point in the system hav- 
ing inertial motion. Thus, companions with smaller masses 
offset the barycenter only slightly from the host star whose 
motion is detected, either by the analysis of pulsar timing 
variations or by radial velocity measurements. Therefore, 
such methods - which are otherwise fairly common among 
the investigation techniques of binary or multiple stellar sys- 
tems - yielded success in the form of confirming planets only 
after the evolution of instrumentation. Due to the physical 
constraints found in these methods, the masses of the plan- 
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ets can only be constrained by a lower limit, while we even 
do not have any information on the sizes of these objects. 

The discovery of 51 Peg b was followed by numer- 
ous other detections, mainly by the method of radial ve- 
locity analysis, yielding the discovery, for instance, of the 
first planetary system with two planets a round 47 UMa 
l|Butler fc Marcvi Il996l : iFischer et al] 120021 '), and the first 
multiple planetary system around v And (|Butler et alj 
1 19991 ). Until the first photometric dete ction of planetary 
transits in the system of HP 209458(b) (|Henrv et al.ll2000l : 
ICharbonneau et all [2OO0I '). no radius estimations could be 
given to the detected planets, and all of these had only 
lower limits for their masses. Transiting planets provide the 
opportunity to characterize the size of the planet, and by 
the known inclination of its orbit, one can derive the mass 
of the planet without any ambiguity by combining the re- 
sults of transit photometry with the radial velocity mea- 
surements. The planetary nature of HD 209458b was first 
confirmed by the analysis of radial velocity variations alone. 
The first discovery based on photometric detection of pe- 
riodi c dips in light curve s was the discovery of OGLE-TR- 
56b iKonacki et al.ll2003l ). Since several scenarios can mimic 
events that have similar light curves to transiting planets, 
confirmation spectroscopy and subsequent analysis of radial 
velocity data is still necessary to verify the planetary na- 
ture of objects fou nd by transit searches (jOueloz et al.|[200ll : 
iTorres et al.]|2005h . 

Since the first identification of planetary objects tran- 
siting their parent stars, numerous additional systems have 
been discovered either by the detection of transits after 
a confirmation based on radial velocity measurements or 
by searching transit-like signals in photometric time series 
and confirming the planetary nature with follow-up spec- 
troscopic and radial velocity data. The former method led 
to the discovery of transits for many well-studied systems, 
such as HD 18973 3 (planet transiting a nearb y K dwarf; 
iBouchv et a"i]l2005l) . GJ 436 (iButler et al.ll2004l ), HD 17156 



( Fischer et al. 



20071 : iBarbieri et al.ll2007^ or HD 80606 (the 



transiting pl anet with the longest known orbital p eriod of 
~ 111 days: iNaef et al.l [2OO1I : iMoutou et al.1 120091 '). These 
planets with transits confirmed later on are found around 
brighter stars since surveys for radial velocity variations 
mainly focus on these. However, the vast majority of the 
currently known transiting extrasolar planets have been de- 
tected by systematic photometric surveys, fully or partially 
dedicated for planet searches. Such projects monitor ei- 
ther faint targets using telescopes with small field-of-view 
or bright targets involving large field-of-view optical instru- 
mentation. Some of the projects focused on t he monitor- 
ing o f smaller fields are th e Monitor project (jlrwin et al.l 
I2OO6I: lAigrainet et al.l l2007h . Deep MMT Transit Survey 
l|Hartman et al.l 20081 ') . a survey for planetary transits in 
the field of NGC 7789 bv lBramich et al.1 (|2005l ). the "Sur- 
vey for Tran s iting Extrasolar Planets in Stellar Systems" by 
iBurke et all (120041'). "Planets in Stellar Clusters Extensive 
Search" (|Mocheiska et"all l2002l. I2OO6I) , "Single- f ield t ransit 
survey toward th e Lupus" o f Weldrake et al.l (|2008l ). the 
S"WEEPS project (|Sahu et al.ll2006l ). an d the Optical Grav- 
itational Lensing Experim ent (OGLE) (|Udalski et al.|[T993l . 
l2002l : iKonackT et al.|[200^ ). Projects monitoring wide fields 
are the 'Wide Aiigle Sea.rch for Planets CWASP, Super'WASP, 
see IStreet et aLll2003l : iPoUacco et"al] |2004| : ICameron et al.1 



l2007l ). the XO project (|McCullough et al.l |2005| . I2006I '). 

the Hungarian-made Autom ated Telescope project (HAT- 
Net, iBakos et all 120 02". '20041), the T ransatlantic Exoplanet 
Survev (TrES. lAlonso ct al] 120041'). the Kilodegree Ex- 
trernely Little Telescope (KELT. IPepper. Gould fc DePovl 
l2004l : |Pepper et al.ll2007l'). and the Berlin Extrasolar Transit 
Search project fBEST. lRauer et al.|[20o3) . One should men- 
tion here the ex isting space-borne project, the CoiJoT mis- 
sion (|Barge et al.. ,2008. ) and th e Kepler mission, la unched 
successfully on 7 March 2009 (|Borucki et al.1 120071 '). Both 
missions are dedicated (in part time) to searching for tran- 
siting extrasolar planets. As of March 2009, the above men- 
tioned projects announced 57 planets. 6 planets were found 
by radial velocity surveys where transits were confirmed af- 
ter the detection of R'V variations (GJ 436b, HD 149026b, 
HD 17156b, HD 80606b, HD 189733b and HD 209458b), 
while the other 51 were discovered and announced by one 
of the above mentioned surveys. The CoRoT mission an- 
nounced 7 planets, for which 4 had published orbital and 
planetary data; the OGLE project reported data for 7 plan- 
ets and an additional planet with existing photometry in 
the OGLE arc hive has also been confirmed by an inde- 
pendent group (|Snellen et al.|[2008l ): the Transatlantic Ex- 
oplanet Survey reported the discovery of 4 planets; the XO 
project has detected and confirmed 5 planets; the SWEEPS 
project found 2 planets; the SuperWASP project announced 
14-1-1 planets, however, 2 of them are known only from con- 
ference announcements; and the HATNet project has 10 + 1 
confirmed planets. The planet 'WASP-ll/HAT-P-lOb had a 
shared discovery, it was confirmed independently by the Su- 
per'WASP and HATNet groups (this common discovery has 
been denoted earliet by the -1-1 term). The HATNet project 
also confirmed independently the planetary nature of the 
object X0-5b (|Pal et al.ll2008d '). 

All of the above mentioned wide-field surveys involve 
optical designs that yield a field-of-view of several degrees, 
moreover, the KELT project monitors areas having a size of 
thousand square degrees (hence the name, "Kilodegree Ex- 
tremely Little Telescope"). The calibration and data reduc- 
tion for such surveys revealed various problems that were not 
present on the image processing of "classical" data (obtained 
by telescopes with fast focal ratios and therefore smaller 
field-of-view) . Some of the difficulties that occur are the fol- 
lowing. Even the calibration frames themselves have to be 
filtered carefully, in order to avoid any significant structures 
(such as patches of clouds in the fiat field images). Images 
taken by fast focal ratio optics have significant vignetting, 
therefore the calibration process should track its side effects, 
such as the variations in the signal-to-noise level across the 
image. Moreover, fast focal ratio yields comatic aberration 
and therefore systematic spatial variations in the stellar pro- 
files. Such variations make the source extraction and star de- 
tection algorithms not only more sensitive but also are one 
of the major sources of the correlated noise (or red noise) 
presented in the final light curves ^ . Due to the large field-of- 
view and the numerous individual objects presented in the 
image, the source identification and the derivation of the 
proper "plate solution" for these images is also a non-trivial 
issue. The photometry itself is hardened by the very narrow 



^ The time variation of stellar profiles is what causes red noise 
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and therefore undersampled sources. Unless the effects of the 
undersampled profiles and the spatial motions of the stellar 
profiles are handled with care, photometric time series are 
affected by strong systematics. Due to the short fractional 
duration and the shallow fiux decrease of the planetary tran- 
sits, several thousands of individual frames with proper pho- 
tometry are required for significant and reliable detection. 
Since hundreds of thousands of stars are monitored simul- 
taneously during the observation of a single field, the image 
reduction process yields enormous amount of photometric 
data, i.e. billions of individual photometric measurements. 
In fact, hundreds of gigabytes up to terabytes of processed 
images and tabulated data can be associated in a single mon- 
itored field. Even the most common operations on such a 
large amount of data require special methods. 

The Hungarian Automated Telescope (HAT) project 
was initiated by Bohdan Paczyski and Caspar Bakos 
(|Bakos et al.ll2002l l. Its succ essor, the Hungari an-made Au- 
tomated Telescope Network (|Bakos et aLlbOoil ) is a network 
of small class of telescopes with large field-of-view, dedi- 
cated to an all-sky variability survey and search for plane- 
tary transits. In the past years, the project has became one 
of the most successful projects in the discovery of almost 
one fifth of the known transiting extrasolar planets. After 
joining the project in 2004, the author's goal was to over- 
come the above mentioned issues and problems, related to 
the image processing of the HATNet data. In this thesis, 
the efforts for the development of a software package and its 
related applications in the HATNet project are summarized. 

This PhD thesis has five chapters. Following the In- 
troduction, the second chapter, "Algorithms and Software 
environment" discusses the newly developed and applied al- 
gorithms that form the basis of the photometry pipeline, and 
gives a description on the primary concepts of the related 
software package in which these algorithms are implemented. 
The third chapter, "HATNet discoveries" describes a partic- 
ular example for the application of the software on the anal- 
ysis of the HATNet data. This application and the discussion 
is related to the discovery of the planet HAT-P-7b, transit- 
ing a late F star on a quite tight orbit. The fourth chapter, 
"Follow-up observations" focuses on the post-discovery mea- 
surements (including photometric and radial velocity data) 
of the eccentric transiting exoplanetary system of HAT-P- 
2b. The goals, methods and theses are summarized in the 
fifth chapter. 



2 ALGORITHMS AND SOFTWARE 
ENVIRONMENT 

In principle, data reduction or simply reduction is the pro- 
cess when the raw data obtained by the instrumentation are 
transformed into a more useful form. In fact, raw data can be 
analyzed during acquisition in order do modify the instru- 
mentation parameters for the subsequent measurements^. 

For instance, in the case of HATNet, real-time astrometric guid- 
ing is used to tweak the mount coordinates in the cases when 
the telescope drifts away from the desired celestial position. This 
guiding basically uses the same algorithms and routines that arc 
involved in the photometric reduction. Like so, simplified forms of 
photometry can bo used in the case of follow-up measurements of 



However, in the practice of astronomical data analysis, all 
raw data are treated to be known in advance of the re- 
duction process. Moreover, the term "more useful form" of 
data is highly specific and depends on our needs. Regard- 
ing to photometric exoplanetary studies, this "more useful 
form" means two things. First - as in the case of HAT- 
Net where the discoveries are based on long-term photo- 
metric time series reduction ends at the stage of ana- 
lyzed light curves, where transit candidates are recovered by 
the result of this analysis. Second, additional high-precision 
photometry^ yields precise information directly about the 
planet itself. One should mention here that other types of 
measurements involving advanced and/or space-borne tech- 
niques (for instance, near-infrared photometry of secondary 
eclipses) have same principles of the reduction. The basics of 
the reductions are roughly the same and such observations 
yield even more types of planetary characteristics, such as 
brightness contrast or surface temperature distribution. 

The primary platform for data reduction is computers 
and the reduction processes are performed by dedicated soft- 
ware systems. As it was mentioned in the introduction, exist- 
ing software solutions lack several relevant components that 
are needed for a consistent analysis of the HATNet data 
flow. One of our goals was to develop a software package 
that features all of the functionality required by the proper 
reduction of the HATNet and the related follow-up photom- 
etry. The package itself is named fi/fihat, referring to both 
the HATNet project as well as the invocation of the related 
individual programs. 

In the first major chapter of this PhD thesis, I sum- 
marize both the algorithms and their implementations that 
form the base of the fi/fihat software package. Due to the 
difficulties of the undersampled and wide-field photometry, 
several new methods and algorithms should have been de- 
veloped, tested and implemented that were missing from 
existing and available image reduction packages. These dif- 
ficulties are summarized in the next section (Sec. 12.1] ) while 
the capabilities and related problems of existing software 
solutions are discussed in Sec. 12.21 

The following sections describe the details of the algo- 
rithms and methods, focusing primarily on those that do 
not have any known implementation in any publicly avail- 
able and/or commercial software. Sec. 12.31 discusses the de- 
tails of the calibration process. Sec. 12.41 describes how the 
point-like sources (stars) are detected, extracted and char- 
acterized from the images, the details of the astrometry and 
the related problems - such as automatic source identifi- 
cation and obtaining the plate solution - are explained in 
Sec. 12.51 the details of the image registration process is dis- 
cussed in Sec. 12.61 Sec. [2]7] summarizes the problems related 
to the instrumental photometry. Sec. 12.81 describes the con- 
cepts of the "image subtraction" process, that is mainly the 
derivation of a proper convolution transformation between 
two registered images. Sec. 12.91 explains how can the pho- 
tometry be optimally performed on convolved or subtracted 

exoplanetary candidate host stars: if light curve variations show 
unexpected signals, the observation schedule could be changed 
accordingly to save expensive telescope time. 

^ Combined with additional techniques, such as spectroscopy or 
stellar evolution modelling. The confirmation the planetary na- 
ture by radial velocity measurements is essiential. 
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Figure 1. Plot showing the light curve scatter (rms, in magni- 
tudes) of a mock star with various FWHMs and having a flux 
of / = 10000 (in electrons). The light curve rms is plotted as the 
function of the subpixel-level inhomogeneity, Q. Supposing a pixel 
structure where the pixel is divided to two rectangles of the same 
size, Q is defined as the difference in the normalized quantum 
efficiencies of these two parts (i.e. Q = represents completely 
uniform sensitivity and at Q = 1 one of the parts is completely 
insensitive (that is typical for front-illuminated detector). 



images. Sec. 12.101 describes the major concepts of how the 
still remaining systematic light curve variations can be re- 
moved. 

In Sec. 12.111 after the above listed description of the 
crucial steps of the whole image reduction and photometry 
process, I outline the major principles of the newly devel- 
oped software package. This part is then followed by the 
detailed description of the individual components of the soft- 
ware package. And finally, the chapter ends with the prac- 
tices about how this package can be used in order to perform 
the complete image reduction process. 

2.1 Difficulties with undersampled, crowded and 
wide-field images 

In this section we summarize effects that are prominent in 
the reduction of the HATNet frames when compared to the 
"classical" style of image reductions. The difficulties can be 
categorized into three major groups. These groups do repre- 
sent almost completely different kind of problems, however, 
all of these are the result of the type of the survey. Namely, 
these problems are related to the undersampled property, 
the crowding of the sources that are the point of interest 
and the large field-of-view of the images. In this section we 
examine what particular problems arise due to these prop- 
erties. 



2.1.1 Undersampled images 

At a first glance, an image can be considered to be under- 
sampled if the source profiles are "sharp" . The most preva- 
lent quantity that characterizes the sharpness of the (mostly 
stellar) profiles is the full width at half magnitude (FWHM). 
This parameter is the diameter of the contour that con- 
nects the points having the half of the source's peak inten- 
sity. Undersampled images therefore have (stellar) profiles 



with small FWHM, basically comparable to the pixel size. 
In the following, we list the most prominent effects of such 
a "small" FWHM and also check what is the practical limit 
below which this "small" is really small. In this short sec- 
tion we demonstrate the yields of various effects that are 
prominent in the photometry for stellar profiles with small 
FWHMs. All of these effects worsen the quality of the pho- 
tometry unless special attention is made for their reduction. 

2.1.1.1 Subpixel structure The effect of the subpixel 
structure is relevant when the characteristic length of the 
flux variations becomes comparable to the scale length of 
the pixel-level sensitivity variations in the CCD detector. 
The latter is resulted mostly by the presence of the gate 
electrodes on the surface of the detector, that block the pho- 
tons at certain regions of a given pixel. Therefore, this struc- 
ture not only reduces the quantum efficiency of the chip but 
the signal depends on the centroid position of the incoming 
flux: the sharper the profile, the larger the dependence on 
the centroid positions. As regards to photometry, subpixel 
structure yields a non-negligible correlation between the raw 
and/or instrumental magnitudes and the fractional centroid 
positions. Advanced detectors such eis back-illuminated CCD 
chips reduce the side effects of subpixel structure and also 
have larger quantum efficiency. Fig. [T] shows that the effect 
of the subpixel structure on the quality of the photometry 
highly dominates for sharp stars, where FWHM < 1.2 pixels. 

2.1.1.2 Spatial quantization and the size of the 
aperture On CCD images, aperture photometry is the 
simplest technique to derive fluxes of individual point 
sources. Moreover, advanced methods such as photometry 
based on PSF fitting or image subtraction also involve aper- 
ture photometry on the fit residuals and the difference im- 
ages, thus the properties of this basic method should be 
well understood. In principle, aperture is a certain region 
around a source. For nearly symmetric sources, this aper- 
ture is generally a circular region with a pre-defined radius. 
Since the image itself is quantized (i.e. the fluxes are known 
only for each pixel) at the boundary of the aperture, the 
per pixel flux must be properly weighted by the area of the 
intersection between the aperture and the pixel. Aperture 
photometry is implemented in almost all of the astronom i- 
cal data reduction software packag es (see e.g. Stetsonlll987l ). 
As it is known from the literature (|Howell|[l989l ) . both small 
and large apertures yield small signal-to-noise ratio (SNR) 
or relatively high light curve scatter (or root mean square, 
rms). Small aperture contains small amount of flux there- 
fore Poisson noise dominates. For large apertures, the back- 
ground noise reduces the SNR ratio. Of course, the size of 
the optimal aperture depends on the total flux of the source 
as well as on the magnitude of the background noise. For 
fainter sources, this optimal aperture is smaller, approxi- 
mately its radius is in the range of the profile FWHM, while 
for brighter stars it is few times larger than the FWHM (see 
also lHowelllll989l ). However, for very narrow/sharp sources, 
the above mentioned naive noise estimation becomes mis- 
leading. As it is seen in the subsequent panels of Fig. (2] 
the actual light curve scatter is a non-trivial oscillating 
function of the aperture size and this oscillation reduces 
and becomes negligible only for stellar profiles wider than 
FWHM > 4.0 pixels. Moreover, a "bad" aperture can yield 
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Figure 2. The graphs are showing the light curve scatters for mock stars (with 1% photon noise rms) when their flux is derived using 
aperture photometry. The subsequent panels shows the scatter for increasing stellar profile FWHM, assuming an aperture size between 
1 and 5 pixels. The thick dots show the actual measured scatter while the dashed lines represent the lower limit of the light curve rms, 
derived from the photon noise and the background noise. 



a light curve rms about 3 times higher than the expected 
for very narrow profiles. The oscillation has a characteris- 
tic period of roughly 0.5 pixels. It is worth to mention that 
this dependence of the hght curve scatter on the aperture 
radius is a direct consequence of the topology of intersecting 
circles and squares. Let us consider a bunch of circles with 
the same radius, drawn randomly to a grid of squares. The 
actual number of the squares that intersect a given circle de- 
pends on the circle centroid position. Therefore, if the circles 
are drawn uniformly, this number of intersecting squares has 
a well defined scatter. In Fig. |3]this scatter is plotted as the 
function of the circle radius. As it can be seen, this scatter 
oscillates with a period of nearly 0.5 pixels. Albeit this prob- 
lem is much more simpler than the problem of light curve 
scatter discussed above, the function that describes the de- 
pendence of the scatter in the number of intersecting squares 
on the circle radius has the same qualitative behavior (with 
the same period and positions of local minima). This is an 
indication of a non-trivial source of noise presented in the 
light curves if the data reduction is performed (at least par- 
tially) using the method of aperture photometry. In the case 
of HATNet, the typical FWHM is between ~ 2 - 3 pixels. 
Thus the selection of a proper aperture in the case of simple 
and image subtraction based photometry is essential. The 
methods intended to reduce the effects of this quantization 
noise are going to be discussed later on, see Sec. 12.101 

2.1.1.3 Spline interpolation As it is discussed later 
on, one of the relevant steps in the photometry based on im- 
age subtraction is the registration process, when the images 
to be analyzed are spatially shifted to the same reference sys- 
tem. As it is known, the most efficient way to perform such 
a registration is based on quadratic or cubic spline interpo- 
lations. Let us suppose a sharp structure (such as a narrow, 
undersampled stellar profile) that is shifted using a trans- 
formation aided by cubic spline interpolation. In Fig.|4]a se- 
ries of one-dimensional sharp profiles are shown for various 




Figure 3. If circles with a fixed radius arc drawn randomly and 
uniformly to a grid of squares, the number of intersecting squares 
has a well-defined scatter (since the number of squares intersect- 
ing the circle depends not only the radius of the circle but on the 
centroid position). The plot shows this scatter as the function of 
the radius. 



FWHMs between ~ 1 and ~ 3 pixels, before and after the 
transformation. As it can be seen well, for very narrow stars, 
the resulted structure has values smaller than the baseline of 
the original profile. For extremely sharp (FWHM « 1) pro- 
files, the magnitude of these undershoots can be as high as 
10 — 15% of the peak intensity. Moreover, the difference be- 
tween the shifted structure and a fiducial profile centered on 
the shifted position also has a specific oscillating structure. 
The magnitude of such oscillations decreases dramatically if 
the FWHM is increased. For profiles with FWHM r; 3, the 
amplitude of such oscillation is about a few thousandths of 
the peak intensity (of the original profile). If the photom- 
etry is performed by the technique of image subtraction, 
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Figure 4. Onc-dimensional stellar profiles for various FWHMs, shifted using spline interpolation. The profiles on the upper stripe show 
the original profile while the plots in the middle stripe show the shifted ones. All of the profiles are Gaussian profiles (with the same 
total flux) and centered at xq = 10.7. The shift is done rightwards with an amplitude of = 0.4. The plots in the lower stripe show 
the difference between the shifted profiles and a fiducial sampled profile centered a.t x = xq + Ax = 10.7 + 0.4 = 11.1. 
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Figure 5. This plot shows how the profile FWHM is overesti- 
mated by the simplification of the fit. The continuous line shows 
the fitted FWHM if the model function is sampled at the pixel 
centers (instead of integrated properly on the pixels). The dashed 
line shows the identity function, for comparison purposes. 



modelling. In most of the data reduction processes, stellar 
profiles detected on CCD images are characterized by simple 
analytic model functions. These functions (such as Gaussian 
or Moffat function) have a few parameters that are related 
to the centroid position, peak intensity and the profile shape 
parameters. During the extraction of stellar sources the pa- 
rameters of such model functions are adjusted to have a 
best fit solution for the profile. In order to perform a self- 
consistent modelling, one should derive the integrated value 
of the model function to adjacent pixels and fit these in- 
tegrals to the pixel values instead of sampling the model 
function on a square grid and fit these samples to the pixel 
values. Although the calculations of such integrals and its 
parametric derivatives* are computationally expensive, ne- 
glecting this effect yields systematic offsets in the centroid 
positions and a systematic overestimation of the profile size 
(FWHM). Since the plate solution is based on the individ- 
ual profile centroid coordinates, such simplification in the 
profile modelling yields additional systematics in the final 
light curves^ Moreover, precise profile modelling is essential 
in the reduction of the previously discussed spline interpola- 
tion side effect. As an example, in Fig.[S]we show how the fit- 
ted FWHM is overestimated by the ignorance of the proper 
profile modelling, if the profile model function is Gaussian. 



such effects yield systematics in the photometry. Attempts 
to reduce these effects are discussed later on (see Sec. 12. 9|) . 

2.1.1.4 Profile modelling Regarding to undersampled 
images, one should mention some relevant details of profile 



^ Parametric derivatives of the model functions are required by 
most of the fitting methods. 

® For photometry, the final centroid positions are derived from 
the plate solution and a catalogue. Therefore, systematic varia- 
tions in the plate solution indirectly yield systematic variations 
in the photometry and in the light curves. 
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2.1.2 Crowded images 

Since the "CCD era", various dense stellar fields, such as 
globular clusters or open clusters are monitored for generic 
photometric analysis and variability search. The main prob- 
lems of such crowded images are well known and several 
attempts have been done in order to reduce the side effects 
resulted from the merging profiles. In this section some of 
the problems are discussed briefly. 

2.1.2.1 Merging and sensitivity to profile sharpness 

Merging of the adjacent stellar profiles have basically two 
consequences in the point of photometry. First, it is hard 
to derive the background intensity and noise level around a 
given target star. Stars in the background area can be ex- 
cluded by two ways: the pixels belonging to such profiles can 
be ignored either by treating them outliers or the proxim- 
ity of other photometric centroids are removed from the set 
of background pixels. The second consequence of the profile 
merging is the fact that flux from adjacent stars is likely to 
be "flowed" underneath to the target aperture. Moreover, 
the magnitude of such additional flux depends extremely 
strongly on the profile FWHMs and therefore variations in 
the widths of the profiles cause significant increase in the 
light curve scatter. 

2.1.2.2 Modelling The modelling of stellar profiles, 
both by analytical model functions and empirical point- 
spread functions are definitely hardened in the case of merg- 
ing sources. In this case the detected stars cannot be mod- 
elled separately, thus a joint fit should be performed simul- 
taneously on all of the stars or at least on the ones that 
are relatively close to each other to have significant over- 
lapping in the model functions. In the case of extremely 
crowded fields, sophisticated grouping and/or iterative pro- 
cedures should be employed, otherwise the computation of 
the inverse matrices (a ssociated with the parameter fitting) 
is not feasible (see also IStetsonlll987l ). 

As we will see later on, the method of difference image pho- 
tometry helps efficiently to reduce these side effects related 
to the crowdness of the images. However, it is true only for 
differential photometry, i.e. during the photometry of the 
reference frames® these problems still emerge. 



2.1.3 Large field-of-view 

Additionally to the previously discussed issues, the large size 
of the field-of-view also introduces various difficulties. 

2.1.3.1 Background variations Images covering large 
field-of-view on the sky are supposed to have various back- 
ground structures, such as thin cirrus clouds, or scattered 
light due to dusk, dawn or the proximity of the Moon or even 



° In principle, the method of differential photometry derives the 
flux of objects on a target image by adding the flux of the objects 
on a reference image to the flux of the residual on the image 
calculacted as the difference between the target and the reference 
image. 



Table 1 . Typical astrometric residuals in the function of polyno- 
mial transformation order, for absolute and relative transforma- 
tions. For absolute transformations the reference is an external 
catalog while for relative transformations, the reference is one of 
the frames. 



Order 


Absolut 


;e 


Relative 


1 


0.841 - 


0.859 


0.117- 


0.132 


2 


0.795 - 


0.804 


0.049 - 


0.061 


3 


0.255 - 


0.260 


0.048 - 


0.061 


4 


0.252 - 


0.259 


0.038 - 


0.053 


5 


0.086 - 


0.096 


0.038 - 


0.053 


6 


0.085 - 


0.096 


0.038 - 


0.053 


7 


0.085 - 


0.095 


0.038 - 


0.053 


8 


0.085 - 


0.095 


0.038 - 


0.053 


9 


0.085 - 


0.095 


0.038 - 


0.053 



interstellar clouds . These background variations make im- 
possible the derivation of a generic background level. More- 
over the background level cannot be characterized by simple 
functions such as polynomials or splines since it has no any 
specific scale length. Because the lack of a well-defined back- 
ground level, the source extraction algorithm is required to 
be purely topological (see also Sec. I2.4|l . 

2.1.3.2 Vignetting, signal-to-noise level and effec- 
tive gain The large field-of-view can only be achieved by 
fast focal ratio optical designs. Such optical systems do not 
have negligible vignetting, i.e. the effective sensitivity of the 
whole system decreases at the corners of the image. In the 
case of HATNet optics, such vignetting can be as strong as 1 
to 10. Namely, the total incoming fiux at the corners of the 
image can be as small as the tenth of the fiux at the center 
of the image. Although fiat-field corrections eliminate this 
vignetting, the signal-to-noise ratio is unchanged. Since the 
latter is determined by the electron count, increasing the 
fiux level reduces the effective gain^ at the corner of the 
images. Since the expectations of the photometric quality 
(light curve scatter and/or signal-to- noise) highly depends 
on this specific gain value, the information about this yield 
of vignetting should be propagated through the whole pho- 
tometric process. 

2.1.3.3 Astrometry Distortions due to the large field- 
of-view affects the astrometry and the source identification. 
Such distortions can efficiently be quantified with polyno- 
mial functions. After the sources are identified, the optimal 
polynomial degree (the order of the fit) can easily be ob- 
tained by calculating the unbiased fit residuals. For a sample 
series of HATNet images we computed these fit residuals, as 
it is shown in Table [T] It can easily be seen that the residu- 
als do not decrease significantly after the 5 — 6th order if an 
external catalogue is used as a reference, while the optimal 
polynomial degree is around ~ 3 — 4 if one of the images is 

^ Although interstellar clouds are steady background structures, 
in the point of the analysis of a single image, these cause the same 
kind of features on the image. 

* The gain is defined as the joint electron/ADU conversion ratio 
of the amplifier and the A/D converter. A certain CCD camera 
may have a variable gain if the amplification level of the signal 
read from the detector can be varied before digitization. 
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used as a reference. The complex problem of the astrometry 
is discussed in Sec. 12.51 in more detail. 

2.1.3.4 Variations in the profile shape parameters 

Fast focal ratio optical instruments have significant comatic 
aberrations. The comatic aberration yields not only elon- 
gated stellar profiles but the elongation parameters (as well 
as the FWHMs themselves) vary across the image. As it 
was demonstrated, many steps of a complete photometric 
reduction depends on the profile sizes and shapes, the proper 
derivation of the shape variations is also a relevant issue. 

Summary 

In this section we have summarized various influences of 
image undersampling, crowdness and large field-of-view that 
directly or indirectly affects the quality of the photometry. 
Although each of the distinct effects can be well quantified, 
in practice all of these occur simultaneously. The lack of 
a complete and consistent software solution that would be 
capable to overcome these and further related problems lead 
us to start the development of a program designed for these 
specific problems. 

In the next section we review the most wide-spread soft- 
ware solutions in the field of astronomical photometric data 
reduction. 

2.2 Problems with available software solutions 

In the past decades, several software packages became avail- 
able for the general public, intended to perform astronomical 
data reductions. The most widely recognized package is the 
Image Reduction and Analysis Facility (IRAF), distributed 
by the National Optical Astronomy Observatories (NOAO). 
With the exception of photometry methods of image sub- 
traction, many algorithms related to photometric data re- 
ductions have been implemented i n the framework of IRAF 
(for instance, DAOPHOT, see e.g. lStetsonlll987l . ll989D . The 
first public implementation of th e image convolution and the 
related photometry was given bv lAlard fc LuptonI (| 19981 ). in 
the form of the ISIS package. This package is focusing on 
certain steps of the procedure but it is not a complete solu- 
tion for data reduction (i.e. ISIS alone is not sufficient if one 
should derive light curves from raw CCD frames, in this case 
other packages must be involved due to the lack of several 
fu nctionalities in the I SIS p ackage). The SExtractor package 
of iBertin fc Arnoutd l|l996l ) intends to search, classify and 
characterize sources of various kind of shape and brightness. 
This program was designed for extragalactic surveys, how- 
ever, it has several built-in methods for photometry as well. 
Of course, there are several other independent packages or 
wrappers for the previously mentioned ones®. Table [5] gives 
a general overview of the advantages and disadvantages of 
the previously discussed software packages. Currently, one 
can say that these packages alone do not provide sufficient 



functionality for the complete and consistent photometric 
reduction of the HATNet frames. In the following, we are 
focusing on those particular problems that arise during the 
photometric reductions of images similar to the HATNet 
frames and as of this writing, do not have any publicly avail- 
able software solutions to overcome. 

2.3 Calibration and masking 

For astronomical images acquired by CCD detectors, the 
aim of the calibration process is twofold. The first goal is 
to reduce the effect of both the differential light sensitivity 
characteristics of the pixels and the large-scale variations 
yielded by the telescope optics. The second goal is to mark 
the pixels that must be excluded from the further data re- 
duction since the previously mentioned corrections cannot 
be e performed because of various reasons. The most com- 
mon sources of such reasons are the saturation and blooming 
of bright pixels, cosmic ray events or malfunctioning pix- 
els (such as pixels with highly nonlinear response or with 
extraordinary dark current). Of course, some of these ef- 
fects vary from image to image (e.g. saturation or cosmic 
ray events) while other ones (such as nonlinear pixels) have 
constant structure. 

In this section the process of the calibration is described, 
briefiy discussing the sensitivity corrections, followed by a 
bit more detailed explanation of the masking procedure 
(since it is a relevant improvement comparing to the ex- 
isting software solutions) . Finally, we show how these masks 
are realized and stored in practice. The actual software im- 
plementation related to the calibration process are described 
later in Sec. [2T2l 

2. 3. 1 Steps of the calibration process 

Basically, the calibration of all of the image frames, almost 
independently from the instrumentation, has been done in- 
volving bias, dark and fiat images and overscan correction 
(where an appropriate overscan section is available on the 
detector). These calibration steps correct for the light sen- 
sitivity inhomogeneities with the exception of nonlinear re- 
sponses, effects due to the dependence on the spatial and/or 
temporal variations in the telescope position or in the sky 
background^" and second-order sensitivity effects^'^. In prac- 
tice, the linear corrections provided by the classic calibration 
procedure are acceptable, as in the case of HATNet image 
calibrations. 

Let us consider an image I and denote its calibrated 
form by C(/). If the basic arithmetic operators between two 
images are defined as per pixel operations, C(/) can be de- 
rived as 

where 0(/) is the overscan level^'^ of the image /; Bo, Do and 



® These wrappers allow the user to access functionalities from 
external data processing environments. For instance, the astro- 
nomical reduction package of the IDL environment uses the IRAF 
as a back-end, or the package PyRAF provides access to IRAF 
tasks within the Python language. 



Such as scattered light, multiple reflections in the optics or 
fringing yielded by the variations in the sky background spectrum 
such as the shutter effect 

Derived from the pixel values of the overscan area. The large 
scale structure of the overscan level is modelled by a simple func- 



© 0000 RAS, MNRAS 000, 000-000 



Tools for discovering and characterizing extrasolar planets 9 



Table 2. Comparison of some of the existing software solutions for astronomical image processing and data reduction. All of these 
software systems arc available for the general public, however it does not mean automatically that the particular software is free or open 
source. This table focuses on the most wide-spread softwares, and wo omit the "wrappers", that otherwise allows the access of such 
programs from different environments (for instance, processing of astronomical images in IDL use IRAF as a back-end). 



Pros 
IRAF^ 

• Image Reduction and Analysis Facility. The 
most commonly recognized software for astro- 
nomical data reduction, with large literature and 
numerous references. 

• IRAF supports the functionality of the pack- 
age DAOPHOT-^ , one of the most frequently used 
software solution for aperture photometry and 
PSF photometry with various fine-tune parame- 
ters. 

• IRAF is a complete solution for image anal- 
ysis, no additional software is required if the 
general functionality and built-in algorithms of 
IRAF (up to instrumental photometry) are suf- 
ficient for our demands. 



Cons 

• Not an open source software. Although the higher level modules and 
tasks are implemented in the own programming language of IRAF, the 
back-end programs have non-published source code. Therefore, many 
of the tasks and jobs are done by a kind of "black box", with no real 
assumption about its actual implementation. 

• Old-style user interface. The primary user interface of IRAF follows 
the archaic designs and concepts from the eighties. Moreover, many op- 
tions and parameters reflect the hardware conditions at that time (for 
instance, reading and writing data from/to tapes, assuming very small 
memory size in which the images do not fit and so on) . 

• Lack of functionality required by the proper processing of wide-field 
images. For instance, there is no particular effective implementation for 
astrometry or for light curve processing (such as transposing photomet- 
ric data to light curves and doing some sort of manipulation on the light 
curves, such as de-trending). 



ISIS^. 

• Image subtraction package. The first soft- 
ware solution employing image subtraction based 
photometry. 

• The program performs all of the necessary 
steps related to the image subtraction algorithm 
itself and the photometry as well. 

• Fully open source software, comes with some 
shell scripts (written in C shell), that demon- 
strate the usage of the program, as well as these 
scripts intend to perform the whole process (in- 
cluding image registration, a fit for convolution 
kernel and photometry). 



• Not a complete software solution in a wider context. Additional 
software is required for image calibration, source detection and identifi- 
cation and also for the manipulation of the photometric results. 

• Although this piece of software has open source codebase, the algo- 
rithmic details and some tricks related to the photometry on subtracted 
images are not documented (i.e. neither in the reference scientific articles 
nor in the program itself). 

• The kernel basis used by ISIS is fixed. The built-in basis involves 
a set of functions that can easily and successfully be applied on im- 
ages with wider stellar profiles, but not efficient on images with narrow 
and/or undersampled profiles. 

• Some intermediate data are stored in blobs. Such blobs may contain 
useful information for further processing (such as the kernel solution 
itself), but the access to these blobs is highly inconvenient. 



SExtractor"*. 

• Source- Extractor. Widely used software 
package for extracting and classifying various 
kind of sources from astronomical images. 

• Open source software. 

• Ability to perform photometry on the de- 
tected sources. 



• The primary goal of SExtractor was to be a package that focuses on 
source classification. Therefore, this package is not a complete solution 
for the general problem, it can be used only for certain steps of the whole 
data reduction. 

• Photometry is also designed for extended sources. 



IRAF is distributed by the National Optical Astronomy Observatories, which are operated by the Association of Universities for Research in Astronomy, 
Inc., under cooperative agreement with the National Science Foundation. Sec also http://iraf.net/. 

DAOPHOT is a standalone photometry package, written by Peter Stetson at the Dominion Astrophysical Obs ervatory |Stet5on| [l^987 ) . 

ISIS is available from http : //www2 . iap , f r/users/alard/package , html with additional tutorials and documentation jAlard LuptonlfToga : |Alard|[2000| ') . 

SExtractor is available from http: //sextractor , sourceforge . net/ , sec also |Bertin &z Arnout 



-Fo are the master calibration images of bias, dark and flat, 
respectively. We denote the exposure time of the image x by 
t[x\. 1 1 a; 1 1 denotes the norm of the image x, that is simply 
the mean or median of the pixel values. In practice, when 
any of the above master calibration images does not exist 
in advance, one can substitute for these by zero, or in the 
case of flat images, by arbitrary positive constant value. The 
master calibration frames are the per pixel mean or median 

tion (such as spline or polynomial) and this function is then ex- 
trapolated to the image area. 



averages (with optional n-a rejection) of individual frames; 



C(BO = -O(Bi), (2) 

Bo = {C(i30), (3) 

C(A) = A-0(A)-Bo, (4) 

Do = {C(A)), (5) 

C(FO = -0(FO-So-4§T^O' (6) 

t[Do] 

Fo = {C{F.)). (7) 



Equations ([2]), and ((6]) clearly show that during the cal- 
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ibration of the individual bias, dark and flat frames, only 
overscan correction, overscan correction and a master bias 
frame, and overscan correction, a master bias and a master 
dark frame are used, respectively. 

2.3.2 Masking 

As it was mentioned earlier, pixels having some undesir- 
able properties must be masked in order to exclude them 
from further processing. The fi/fihat package and therefore 
the pipeline of the whole reduction supports various kind of 
masks. These masks are transparently stored in the image 
headers (using special keywords) and preserved even if an 
independent software modifies the image. Technically, this 
mask is a bit- wise combination of Boolean fiags, assigned to 
various properties of the pixels. In this paragraph we briefiy 
summarize our masking method. 

First, before any further processing and right after the 
readout of the images, a mask is added to mark the bad 
pixels of the image. Bad pixels are not only hot pixels but 
pixels where the readout is highly nonlinear or the readout 
noise is definitely larger than the average for the given de- 
tector. These had masks are determined after a couple of 
sky flats were acquired. Using sky flats for the estimation 
of nonlinearity and readout noise deviances are fairly good, 
since during dusk or dawn, images are exposed with different 
exposure times yielding approximately the same flux and all 
of the pixels have a locally uniform incoming flux. See lBako3 
(|2004l ') for further details. 

Second, all saturated pixels are marked with a satura- 
tion mask. In practice, there are two kind of effects related to 
the saturation: 1) when the pixel itself has an intensity that 
reaches the maximum expected ADU value or 2) if there is 
no support for anti-blooming in the detector, charges from 
saturated pixels can overflow into the adjacent ones during 
readout. These two types of saturation are distinguished in 
the oversaturation mask and blooming mask. If any of these 
mask are set, the pixel itself is treated as saturated. We note 
that this saturation masking procedure is also done before 
any calibration. 

Third, after the calibration is done, additional masks 
can be added to mark the hot pixels (that were not corrected 
by subtracting the dark image) , cosmic ray events and so on. 

Actually, the latest version of the package supports the 
following masks: 

• Mask for faulty pixels. These pixels show strong non- 
linearity. These masks are derived occasionally from the ra- 
tios of flat field images with low and high intensities. 

• Mask for hot pixels. The mean dark current for these 
pixels is significantly higher than the dark current of normal 
pixels. 

• Mask for cosmic rays. Cosmic rays cause sharp struc- 
tures, these structures mostly resemble hot or bad pixels, 
but these does not have a fixed structure that is known in 
advance. 

• Mask for outer pixels. After a geometric transformation 
(dilation, rotation, registration between two images), certain 
pixels near the edges of the frame have no corresponding pix- 
els in the original frame. These pixels are masked as "outer" 
pixels. 

• Mask for oversaturated pixels. These pixels have an 



ADU value that is above a certain limit defined near the 
maximum value of the A/D conversion (or below if the de- 
tector shows a general nonlinear response at higher signal 
levels) . 

• Mask for blooming. In the cases when the detector has 
no antiblooming feature or this feature is turned off, ex- 
tremely saturated pixels causes "blooming" in certain direc- 
tions (usually parallel to the readout direction). The A/D 
conversion value of the blooming pixels does not reach the 
maximum value of the A/D conversion, but these pixels also 
should be treated as somehow saturated. The "blooming" 
and "oversaturated" pixels are commonly referred as "sat- 
urated" pixels, i.e. the logical combination of these two re- 
spective masks indicates pixels that are related to the satu- 
ration and its side effects. 

• Mask for interpolated pixels. Since the cosmic rays and 
hot pixels can be easily detected, in some cases it is worth to 
replace these pixels with an interpolated value derived from 
the neighboring pixels. However, these pixels should only be 
used with caution, therefore these are indicated by such a 
mask for the further processes. 

We found that the above categories of 7 distinct masks are 
feasible for all kind of applications appearing in the data 
processing. The fact that there are 7 masks - all of which 
can be stored in a single bit for a given pixel - makes the 
implementation quite easy. All bits of the mask correspond- 
ing to a pixel fit in a byte and we still have an additional 
bit. It is rather convenient during the implementation of 
certain steps (e.g. the derivation of the blooming mask from 
the oversaturated mask), since there is a temporary storage 
space for a bit that can be used for arbitrary purpose. 

2.3.3 Implementation 

The basic per pixel arithmetic operations required by the 
calibration process are implemented in the program f iarith 
(see Sec. I2.12.2|) . while individual operations on associ- 
ated masks can be performed using the fiign program 
fSec. I2.12.6|) . Although the distinct steps of the calibration 
can be performed by the appropriate subsequent invocation 
of the above two programs, a more efficient implementation 
is given by f icalib (Sec. 12. 12. 5|) . that allows fast evaluation 
of equation fl]) on a large set of images. Moreover, f icalib 
also creates the appropriate masks upon request. The master 
calibration frames (referred as Bo, Do and Fo in equation [T]) 
are created by the combination of the individual calibration 
images (see equations [31 [S] and [7| , involving the program 
ficombine (Sec. I2.12.4p . See also Sec. 12.1231 for more spe- 
cific examples about the application of these programs. 

As it was mentioned earlier, the masks are stored in the 
FITS header using special keywords. Since pixels needed to 
be masked represent a little fraction of the total CCD area, 
only information (i.e. mask type and coordinates) about 
these masked pixels are written to the header. By default, 
all other pixels are "good". A special form of run-length 
encoding is used to compress the mask itself, and the com- 
pressed mask is then represented by a series of integer num- 
bers. This series of integers should be interpreted as follows. 
Depending on the values of these numbers, a virtual "cur- 
sor" is moved along the image. After each movement, the 
pixel under the cursor or a rectangle whose lower-left corner 
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MASKINFO= 


'1 -32 16,8 -16 


0,1 


-2 -32 1,1 -2,1 -16 1,0:2 - 


-1,1:3,3 


-32 3,2' 


MASKINFQ= 


'-16 -3,1:4 0,1 


3,3 


-32 3,0 -3,3 -16 1,0:2 0,1 


-2 -32 1 


-1,2' 



Figure 6. Stamp showing a typical saturated star. The images cover an approximately 8' X 5' area (32 X 20 pixels) of the sky, taken 
by one of the HATNet telescopes. The blooming structure can be seen well. The left panel shows the original image itself. In the right 
panel, oversaturated pixels (where the actual ADU values reach the maximum of the A/D converter) are marked with right-diagonal 
stripes while pixels affected by blooming are marked with left-diagonal stripes. Note that most of the oversaturated pixels are also 
blooming ones, since their lower and/or upper neighboring pixels are also oversaturated. Such pixels are therefore marked with both 
left- and right-diagonal stripes. Since the readout direction in this particular detector was vertical, the saturation/blooming structure is 
also vertical. The ' 'MASKINFO' ' blocks seen below the two stamps show how this particular masking information is stored in the FITS 
headers in a form of special keywords. 



Value Interpretation 

T Use type T encoding. T = implies absolute cursor movements, T = 1 implies relative cursor 

movements. Other values of T are reserved for optional further improvements. 
— M Set the current bitmask to M. M must be between 1 and 127 and it is a bit-wise combination of 

the numbers 1, 2, 4, 8, 16, 32 and 64, for faulty, hot, cosmic, outer, oversaturated, blooming and 

interpolated pixels, respectively. 
x,y Move the cursor to the position {x,y) (in the case of T = 0) or shift the cursor position by {x,y) 

(in the case of T = 1) and mark the pixel with the mask value of M. 
x,y :h Move/shift the cursor to/by {x,y) and mark the horizontal line having the length of h and left 

endpoint at the actual position. 
x,y : —V Move/shift the cursor to/by (x,y) and mark the vertical line having the length of v and lower 

endpoint at the actual position. 
x,y:h,w Move/shift the cursor to/by {x,y) and mark the rectangle having a size of /i X to and lower-left 

corner at the actual cursor position. 



Figure 7. Interpretation of the tags found MASKINFO keywords in order to decode the respective mask. The values of M, h, v and w 
must be always positive. 



is at the current cursor position is masked accordingly. In 
Fig. [6] a certain example is shown demonstrating the masks 
in the case of a saturated star (from one of the HATNet im- 
ages). The respective encoded masks (as stored literally in 
the FITS header) can be seen below the image stamps. The 
encoding scheme is summarized in Fig. [7] We found that this 
type of encoding (and the related implementation) provides 
an efficient way of storing such masks. Namely, the encoding 
and decoding requires negligible amount of computing time 
and the total information about the masking requires a few 
dozens from these "MASKINFO" keywords, i.e. the size of the 
FITS image files increases only by 3 — 5 kbytes (i.e. by less 
than 1%). 

2.4 Detection of stars 

Calibration of the images is followed by detection of stars. A 
successful detection of star-like objects is not only important 



because of the reduction of the data but for the telescopes 
of HATNet it is used in situ for guiding and slewing correc- 
tions. 

In the typical field-of-view of a HATNet telescope there 
are 10'' - 10^ stars with suitable signal-to- noise ratio (SNR) , 
which are proper candidates for photometry. Additionally, 
there are several hundreds of thousands, or millions of stars 
which are also easy to detect and characterize but not used 
for further photometry. The HATNet telescopes acquire im- 
ages that are highly crowded and undersampled, due to the 
fast focal ratio instrumentation (//1. 8 for the lenses used by 
the HAT telescopes). Because of the large field-of-view, the 
sky background does also vary rapidly on an ordinary image 
frame, due to the large-scale structure of the Milky Way, 
atmospheric clouds, differential extinction or another hght 
scattering effects. Due to the fast focal ratio, the vignetting 
effects are also strong, yielding stars of the same magnitude 
to have different SNR in the center of the images and the 
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corners. This focal ratio also results in stars with different 
shape parameters, i.e. systematically and heavily varying 
FWHM and elongation even for focused images (this effect 
is known as comatic aberration or coma). These parameters 
may also vary due to the different sky conditions, e.g. air- 
mass resulted also by the large FOV (12-15 degrees in the 
diameter) . 

Because of these, one should expect the following prop- 
erties of a star detection and characterization algorithm that 
are thereafter able to overcome the above mentioned prob- 
lems. 

A) The method should be local both in the sense of pixel 
positions and in the intensity. Namely, the result of an object 
detection must not differ if one applies an afhne transforma- 
tion in the intensity or after a spatial shift of the image. 

B) The method should not contain any characteristic scale 
length, due to the unpredictable scale length of the back- 
ground. It also implies that there should not be any kind of 
"partitioning the image into blocks" during the detection, 
i.e. one should not expect that any of the above mentioned 
affects disappear if the image is divided into certain blocks 
and some quantities are treated as constants in such a block. 
Moreover, there is no "background" for the image, even one 
cannot do any kind of interpolation to determine a smooth 
background. 

C) The algorithm should be fast. Namely, it is expected 
to be an 0{N) algorithm, where N = x Sy, the total 
number of the image pixels. In other words, the computing 
time is expected to be nearly independent from the number 
and/or the density of the detected objects. 

D) On highly crowded and undersampled images, stars 
should be distinguished even if they are very close to each 
other. Thus, the direct detection should not be preceded by 
a convolution with a kernel function, as it is done in the 
most common algorith ms and softwa re (e.g. as it is used in 
DAOPHOT/FIND, see lStetsonlll987l ). Although this prelim- 
inary convolution increases the detectability of low surface 
brightness object, in our case it would fuse nearby stars. 

E) The algorithm may have as few as possible external 
fine-tune parameters. 

F) The algorithm should explicitly assign the pixels to the 
appropriate detected objects. 

G) Last but not least, the algorithm should work prop- 
erly not only for undersampled and crowded images but for 
images acquired by "classic" types of telescopes where the 
average FWHM of the stars are higher and/or the number 
density is lower. Additionally, one may expect from such an 
algorithm to handle the cases of smeared or defocused im- 
ages, even when the star profiles have "doughnut" shape, as 
well as the proper characterization of digitized photographic 
data (e.g. POSS/DSS). 

In this section we give an algorithm that is suitable for 
the above criteria. Moreover, it is purely topological since 
considers only "less than" or "greater than" relations be- 
tween adjacent pixel intensities. Obviously, an algorithm 
that relies only the topology, automatically satisfies the con- 
ditions A and B above. The first part of this section discusses 
how can the image be partitioned to smaller partitions that 
are sets of cohesive pixels that belong to the same star (see 
also condition F above). The second part of the section de- 
scribes how these partitions/stars can be characterized by a 



couple of numbers, such as centroid coordinates, fiux, and 
shape parameters. 

2.4-1 Image partitioning 

2.4.1.1 Pixel links and equivalence classes The first 
step of the detection algorithm is to define local pixel con- 
nections with the following properties. An ordinary pixel 
has 8 neighbors, and the number of neighbors is less only 
if the pixel is a boundary pixel (in this case there can 
be 5 or 3 neighbors) or if any of the neighboring pix- 
els are excluded due to a mask of bad, hot or saturated 
pixel. Including the examined pixel with the coordinates 
of X and y, we select the one with the largest intensity 
from this set. Let us denote the coordinates of this pixel 
by nx{x,y) and ny{x,y). For a shorter notation, we intro- 
duce X = {x,y) and n(x) = [nx(x,y),ny{x,y)\. Obviously, 
\nx — a'l ^ 1 and \ny — y\ ^ 1, i.e. ||n(x) — x||tx> ^ 1, where 
||x||oo = max(|a;|, \y\), the maximal norm. The derivation of 
this set of n = {nx,ny) points requires 0{N) time. Second, 
we define m(x) = [mx{x,y),my{x,y)] for a given pixel by 

, / X if n(x) = X, , , 

m(x) = i , / NN <-u ■ (8) 

m(n(x)) otherwise. 

Note that this definition of ni(x) is only a functional of 
the relation x — + n(x): there is no need for the knowledge of 
the underlying neighboring and the partial ordering between 
pixels. This definition results a set of finite pixel links x, 
n(x), n(n(x)) = n^(x), . . . where the length L of this link is 
the smallest value where n^(x) — n^+^(x) — m(x). Third, 
we define two pixels, say, xi = {xi,yi) and X2 = (2:2,2/2) to 
be equivalent if m(xi) = m(x2). This equivalence relation 
partitions the image into disjoint sets, equivalence classes. In 
other words, each equivalence contains links with the same 
endpoint. Let us denote these classes by d. 

Each class is represented by the appropriate = 
m(Ci) pixel, that is, by definition, a local maximum. Each 
equivalence class can be considered as a possible star, or a 
part of a star if the image was defocused or smeared. In 
Fig. [SI one can see stamps from a typical image obtained by 
one of the HATNet telescopes and the derived pixel links and 
the respective equivalence classes. In the figure, the mapping 
X n(x) is represented by the n(x) — x vectors, originating 
from the pixel x. 

2.4.1.2 Background Let us define the number of pos- 
sible neighbors of a given pixel x by A'o(x). As it was de- 
scribed above, in average it is 8, for boundary pixels it is 5 
or 3, and it can be less if there are surrounding masked 
ones. The quantity -R(x) is defined by the cardinality of 
the set {x' £ Image : n(x') — x}. Let us also define 
K{x) = Ko{y^) + 1 and G(x) as the cardinality of 

{x' G Image : ||x' — x||oo ^5 1 and ni(x') — m(x)}, (9) 

which is the number of surrounding pixels in the same class. 
For a given equivalence class C we can define its background 
pixels by 

B(C) = {x' G C : i?(x') = and G(x') < A'(x')}. (10) 

This set of pixels are the boundary starting points of pixel 
links in this equivalence class. Note that this definition may 
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Figure 8. Left panel: a stamp of 128 X 128 pixels from a typical crowded HAT image, covering approximately 0.5° X 0.5° area on the 
sky. Middle panel: the central are of the stamp shown in the left panel, covering approximately an area of 7' X 7'. This smaller stamp 
has a size of 32 X 32 pixels. Right panel: the links and equivalence classes generated from the smaller stamp. Note that even the faintest 
stars are detected and the belonging pixels form separated partitions (for an example see the stars encircled on the middle panel). 



not reflect the true background if there are merging stars in 
the vicinity. In such case these pixels are saddle points be- 
tween two or more stars. However, the median of the pixel 
intensities in the set B(C) is a good assumption of the lo- 
cal background for the star candidate C, even for highly 
crowded images. For simplicity, let us denote the background 
of C by 



n^(m(C)) :=M(J2(C)) 



(17) 



(11) 



2.4.1.3 Collection of subsets The above definitions of 
equivalence classes and background pixels are quite robust 
ones, but still there are some demands for certain cases. 
First, in extremely crowded fields, the number of background 
pixels can be too small for a local background assumption. 
Second, the defocused or smeared stars may consist of sev- 
eral separate local maxima that yield distinct equivalence 
classes instead of one cohesive set of pixels. To overcome 
these problems, we make some other definitions. An equiv- 
alence class C is degenerated if 



i?(m(C)) < K{m(C)). 



(12) 



In other words, degenerated partitions have local maxima 
on their boundary. For such a partition, one can define the 
two sets of pixels: 



Ji(C) = {x'g Image : ||x' - m(C)||oc = 1}, 
and 

J2(C) = {x'g Ji(C) : m(x')/m(C)}. 



(13) 



(14) 



Let us denote the location of the maximum of a given set J 
by 



M(J) = {x : V x' G J J(x') ^ /(x), } 



(15) 



where /(x) is the intensity of the pixel x. Using the above 
definitions, we can coalesce this degenerated partition C 
with one or more other partitions by two ways. Obviously, 
n(in(C)) = m(C), so we re-define n(m(C)) by either 



n'i(m(C)) :=M(Ji(C)) 



(16) 



if and only if Ji (C) is not the empty set and m[M(Ji(C))] 7^ 
m(C) or 



if and only if J2(C) is not the empty set. Otherwise we do 
not affect n(m(C)). We note that the latter expansion may 
result in a larger amount of coalescing sets, i.e. in the former 
case it may happen that the maximum of the neighboring 
pixels fall into the same class while in the latter case we 
definitely excluded such cases (see the definition of J2(C)). 

2.4.1.4 Prominence In case of highly defocused star 
images, the PSF can be donut-shaped and a single star 
may have separated distinct (and not degenerated) maxima. 
To coalesce such equivalence classes, we define the discrete 
prominence, with almost the same properties as it is known 
from topography. The prominence of a mountain peak in 
topography (a.k.a. topographic prominence or autonomous 
height) is defined as follows. For every path connecting the 
peak to higher terrain, find the lowest point on that path, 
that is at a saddle point. The key saddle is defined as the 
highest of these saddles, along all connecting paths. Then 
the prominence is the difference between the elevation of 
the peak and the elevation of the key saddle. This definition 
cannot be directly applied to our discrete case, since the 
number of possible connecting paths between two maxima 
is an exponential function of the number of the pixels, i.e. we 
cannot get an 0{N) algorithm. Thus, we use the following 
definition for the key saddle s of an equivalence class C: 

s(C) = |x G C : G(x) < A'(x) and (18) 

V x' G C G(x') < A'(x') ^ J(x') 5? J(x)|. 



Thus, the prominence of this class is going to be 
p(C) = J(m(C)) - J(s(C)). 



(19) 



Note that p{C) is always non-negative and if C is degener- 
ated, pic) is zero. The related classes TZ{C) of C are defined 
as 

7^(C) = |C' G Classes : 3x' G C ||x' - s(C)|| = l| (20) 

We define the set of parent classes of C as the set 

P(C) = |C' G 7^(C) : V C" G TZ{C) (21) 
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Figure 9. Left panel: a stamp of a star, covering 32 X 32 pixels from a typical blurred KeplerCam image. Middle panel: the links and 
equivalence classes generated from this stamp, using the basic algorithm without any coalescing. Right panel: the links and equivalence 
classes generated from the stamp, when the partitions with zero prominence are joined to their neighboring partitions. 



m(C") < m(C') and m(C7) < m(C7")]- 

The set of parent classes P{C) can be empty if the class C is 
the most prominent one. If at least one parent class exists, 
the relative prominence of C is defined as 



r(C) = 



PiC) 



I[m(PiC))]-{B{P(C))}- 



(22) 



and, by definition, it is always between and 1. Since the 
classes with low relative prominences are most likely parts 
of a larger object that is dominated by the parent class (or, 
moreover, by the parent of the parent class and so on), we 
connect these low-prominence classes to their parents below 
a critical relative prominence ro. Namely, we alter n(s(C)) 
to one point of P(C), say, x' G P{C) where |ls(C) -x'|| = 1. 
Note that this algorithm for ro = yields the same collection 
of partitions as the usage of the definition J 2(C) in the end 
of the previous subsection only for degenerated partitions. 

2.4-2 Coordinates, shape parameters and analytic models 

In the previous sections we have discussed how astronomical 
images can be partitioned in order to extract sets of pixels 
that belong to the same source. Now we describe how these 
partitions can be characterized, i.e. how can one determine 
the centroid coordinates, total flux of the source and quan- 
tify somehow the shape of the source. 

2.4.2.1 Weighted mean and standard deviance The 

easiest and fastest way to get some estimation on the cen- 
troid coordinates and the shape parameters of the source 
is to calculate the statistical mean and standard deviation 
of the pixel coordinates, weighted by the individual fluxes 
after background subtraction. Let us consider a set of pix- 
els, C = {xi}, each of them has the flux (ADU value) of fi, 
while the background level B of this source is calculated by 
using equation (lllf) . Then the weighted coordinates are 



E(/.-B) ' 



(23) 



while the statistical standard deviation in the coordinates is 
the covariance matrix, defined as 



E(/.-S)(x,-{x))o(x.-{x)) 

o ^ J: 

E(/» - B) 

i 

Let us denote the components of the matrix S by 

E + A K 
K S- A 



(24) 



(25) 



For objects that are not elongated, A = K = 0. It can 
be shown that for elongated objects, the semimajor axis of 
the best fit ellipse (to the contours) has a position angle of 
= i arg(A, K) and an ellipticity of VA^+K^/E. The size 
of the star profiles are commonly characterized by the "full 
width at half magnitude" (FWHM), that can be derived 
from (E,A,K) as follows. Let us consider an elongated 2 
dimensional Gaussian profile that is resulted by the convo- 
lution of a symmetric profile with the matrix 



a + 5 



a-5 



(26) 



It can be shown that such a profile described by {a, 5, k) has 
a covariance of 



S = 



a + 5 



K 

a-5 



(27) 



i.e. for such profiles, = S. Since the FWHM of a Gaussian 
profile with a standard deviation is 2a^2 log 2 « 2.35 a, one 
can obtain the FWHM by calculating the square root of the 
matrix defined in equation (|25|) and multiply the trace of 
the root (that is 2a) by the factor 1.17. Therefore, for nearly 
circular profiles, the FWHM can be well approximated by 
~ 2.35 TE. 

Finally, the total fiux of the object is 



i 

and the peak intensity is 
A = max(/i - B). 



(28) 



(29) 



2.4.2.2 Analytic models In order to have a better char- 
acterization for the stellar profiles, it is common to fit an an- 
alytic model function to the pixels. Such a model has roughly 
the same set of parameters: background level, fiux (or peak 
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S + D 
K 



K 

S-D 



E + A 
K 



K 

E - A 



(33) 



10 1 2 3-3-2-101 2 



Figure 10. Some analytic elongated Gaussian stellar profiles. 
Each panel shows a contour plot for a profile where the sharpness 
parameter 5 = 1 and either \D\ = 0.5 or \K\ = 0.5. Note that if 
the Gaussian polynomial coefficients D and/or K are positive, 
then the respective asymmetric covariance matrix elements A 
and/or K (and the asymmetric convolution parameters S and/or 
K ) are negative and vice versa. 



intensity), centroid coordinates and shape parameters. The 
most widely used models are the Gaussian profile (symmet- 
ric or elongated) and the MolTat profile. In the characteriza- 
tion of stellar profiles, Lorentz profile and/or Voight profile 
are not used since these profiles are not integrable in two 
dimension. 

In the cases of undersampled images, we found that the 
profiles can be well characterized by the Gaussian profiles, 
therefore in the practical implementations (see f istar and 
f irandom, Sec. l2.12T8] Sec. 12. 1277)) we focused on these mod- 
els. Namely, these implementations support three kind of an- 
alytic models, both are derivatives of the Gaussian function. 
The first model is the symmetric Gaussian profile, charac- 
terized by five parameters: the background level B, the peak 
intensity A, the centroid coordinates xo = {xo,yo) and the 
parameter S that is defined as 5 = cr~^, where a is the stan- 
dard deviation of the profile function. Thus, the model for 
the fiux distribution is 



/sym(x) = B + Aexp 



-S(x-xo)^ 



(30) 



The second implemented model is the elongated Gaus- 
sian profile that is characterized by the above five param- 
eters extended with two additional parameters, resulting a 
fiux distribution of 



/clong(x) 



B + A exp I 
+ D{Ax^ - Ay 



1 
2 

2 A 2\ 



'S{Ax^ + Ay^)+ (31) 
+ A-(2AxAy)] }, (32) 



where Ax = x — xo and D and K are the two additional pa- 
rameters, that show how the fiux deviates from a symmetric 
distribution. It is easy to show that the {S, D, K) parameters 
are related to the covariance parameters (E, A,K) as 



The third model available in the implementations de- 
scribes a flux distribution that is called "deviated" since the 
peak intensity is offset from the mean centroid coordinates. 
Stellar proflles that can only be well characterized by such 
a fiux distribution model are fairly common among images 
taken with fast focal ratio instruments due to the strong co- 
matic aberration. Such a model function can be built from a 
Gaussian fiux distribution by multiplying the main function 
by a polynomial: 



/dcv = B-f Aexp 



\s{A^f 



l + ^Pfe^A:c'^A/ .(34) 



In the summation of equation (|34p . 2 ^ k + l !^ M, where M 
is the maximal polynomial order and P02+P20 is constrained 
to be 0. Therefore, for M = 2, 3 or 4 the above function 
involves 2, 6 and 11 other parameters in addition to the 
5 parameters of the symmetric Gaussian profile. If M = 
2, the above polynomial is equivalent to the second order 
expansion of the elongated Gaussian model if P20 — P02 = 
— i_D and Pn = K. However, for Ad = 2 the peak intensity 
is not offset from the mean centroid coordinates, therefore 
in practice A/ = 2 is not used. 

All of the model functions discussed above are nonlin- 
ear in the centroid coordinates xo and the shape parame- 
ters S, D, K or Pki- Therefore, in a para meter fit, one can 
use the Levenberg-Marquardt algorithm (|Press et al.lll993 ) 
since the parametric derivatives of the model functions can 
easily be calculated and using the parameters of the sta- 
tistical mean coordinates and standard deviations as initial 
values yields a good convergence. Moreover, if the iterations 
of the Levenberg-Marquardt algorithm fail to converge, it 
is a good indicator to discard the source from our list since 
it is more likely to be a hot pixel or a structure caused by 
cosmic ray event^''. 

In practice of HATNet and follow-up data reduction, we 
are using the above models as follows. In real-time applica- 
tions, for example when the guiding correction is based on 
the astrometric solution, the derivation of profile centroid 
coordinates is based on the weighted statistical mean of the 
pixel coordinates (and in this case, we are not even interested 
in the shape parameters, just in the centroid coordinates) . If 
more precise coordinates are needed, for example when one 
has to derive the individual astrometric solutions in order 
to have a list of coordinates for photometry, the symmet- 
ric Gaussian or the elongated Gaussian models are used. 
The elongated model is also used when we characterize the 
spatial variations of the stellar profiles. This is particularly 
important when the optics is not adjusted to the detectors: 
if the optical axis is not perpendicular to the plane of the 
CCD chip, the spatial variations in the D and K parame- 
ters show a linear trend across the image. If the optical axis 
is set properly, the linear trend disappears^** . Finally, if we 



Both cosmic ray events and hot pixels are hard to be modelled 
with these analytic functions. 

Moreover, quadratic trends in the D or K components may 
also be there even if the optical axis is aligned properly. In this 
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Figure 11. Analytic models for stellar profiles. From left to right, the three panels show the contour plots for a symmetric Gaussian 
profile, for an elongated Gaussian profile and a deviated profile model of M = 4. Note that all of the three models have a peak intensity 
at the coordinate (0, 0). In the plots the peak intensity is normalized to unity and the contours show the intensity levels with a step size 
of 0.1. All of the plotted models have an 5 = 1 parameter while the other parameters (D, K and Pki) have a value around ~ 0.1 — 0.2. 
Because the choice of S = 1, all of the models plotted here has a FWHM of nearly 2.35. 



need to have an analytic description for the stellar profiles 
as precise as possible, it is worth to use the deviated model. 

2.4.3 Implementation 

The algorithms for extracting stars and characterizing stel- 
lar profiles have been implemented in a standalone binary 
program named f istar, part of the fi/fihat package. All of 
the analytic models described here are available in the pro- 
gram f irandom of which main purpose is to generate artifi- 
cial images. The capabilities of both programs are discussed 
in Sec. 12.121 in more detail. 

2.5 Astrometry 

In the context of reduction of astronomical images, astrom- 
etry refers to basically two things. First, the role of finding 
the astrometrical solution is to find the appropriate func- 
tion that maps the celestial coordinate system to the image 
frame and vice versa. Second, the complete astrometrical 
solution for any given image should identify the individual 
sources (i.e. perform a "cross-matching"), mostly based on 
a catalog that is assumed to be known in advance. 

Theoretically there is no need to have a list from the 
available sources found on the image and to have a pre- 
defined image to find. If one can use only the pixel intensity 
information of the current and a previously analyzed im- 
age to determine a relative transformation and supposing 
an astrometrical solution being obtained for another image, 
the two mappings can be composed that results the astro- 
metrical transformation for the current one. This kind of 
transformations are mostly compositions of dilatation, small 
rotation and shift if the frames have been acquired subse- 
quently by the same instrumentation from the same stel- 
lar field. Such attempts of finding the relative transforma- 
tion based on only the pixel intensities have been made by 
iThiebaut fc Boej (|200ll i. 

case, the magnitude of the quadratic trends is proportional to the 
magnitude of comatic aberration or the focal plane curveture. 



In this section, a robust and fast algorithm is presented, 
for performing astrometry and source cross-identification on 
two dimensional point lists, such as between a catalogue and 
an astronomical image, or between two images. The method 
is based on minimal assumptions: the lists can be rotated, 
magnified and inverted with respect to each other in an ar- 
bitrary way. The algorithm is tailored to work efficiently on 
wide fields with large number of sources and significant non- 
linear distortions, as long as the distortions can be approx- 
imated with linear transformations locally, over the scale- 
length of the average distance between the points. The pro- 
cedure is based on symmetric point matching in a newly 
defined continuous triangle space that consists of triangles 
generated by an extended Delaunay triangulation. 

2. 5. 1 Introduction 

Cross-matching two two-dimensional points lists is a cru- 
cial step in astrometry and source identification. The tasks 
involves finding the appropriate geometrical transformation 
that transforms one list into the reference frame of the other, 
followed by finding the best matching point-pairs. 

One of the lists usually contains the pixel coordinates 
of sources in an astronomical image (e.g. point-like sources, 
such as stars), while the other list can be either a refer- 
ence catalog with celestial coordinates, or it can also consist 
of pixel coordinates that originate from a different source 
of observation (another image). Throughout this section we 
denote the reference (list) as TZ, the image (list) as T, and 
the function that transforms the reference to the image as 

The difficulty of the problem is that in order to find 
matching pairs, one needs to know the transformation, and 
vica versa: to derive the transformation, one needs point- 
pairs. Furthermore, the lists may not fully overlap in space, 
and may have only a small fraction of sources in common. 

By making simple assumptions on the properties of 
Tr^i, however, the problem can be tackled. A very spe- 
cific case is when there is only a simple translation between 
the lists, and one can use cross-correlation techniques (see 
iPhillips fc Davidl 19951 ) to find the transformation. We note, 
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that a method proposed bv lxhiebaut fc Boed (|200ll ) uses the 
whole image information to derive a transformation (trans- 
lation and magnification). 

A more general assumption, typical to astronomical ap- 
plications, is that J-R^i is a similarity transformation (rota- 
tion, magnification, inversion, without shear), i.e. J-r^i — 
AAr -I- b, where A is a (non-zero) scalar A times the or- 
thogonal matrix, b is an arbitrary translation, and r is the 
spatial vector of points. Exploiting that geometrical pat- 
terns remain similar after the transformation, more general 
algorithms have been developed that are based on pattern 
matching (|Grothlll98d : IValdes et al.lll995^ . The idea is that 
the initial transformation is found by the aid of a specific set 
of patterns that are generated from a subset of the points 
on both TZ and I. For example, the subset can be that of the 
brightest sources, and the patterns can be triangles. With 
the knowledge of this initial transformation, more points can 
be cross-matched, and the transformation between the lists 
can be iteratively refined. Some of t hese methods are imple - 
mented as an iraf task in immatch (jPhiUips fc Davislll995h . 

The above pattern matching methods perform well as 
long as the dominant term in the transformation is linear, 
such as for astrometry of narrow field-of-view (FOV) im- 
ages, and as long as the number of sources is small (because 
of the large number of patterns that can be generated - 
see later). In the past decade of astronomy, with the devel- 
opment of large format CCD cameras or mosaic imagers, 
many wide-field surveys appeare d, such as those look ing for 
transient events (e.g. ROTSE — lAkerlof et al.ll200(t ). tran- 
si ting planets (Cha pter [Jl , or all-sky variability (e.g. AS AS 
- |Poimanskilll997l 'l. There are non-negligible, higher order 
distortion terms in the astrometric solution that are due to, 
for instance, the projection of celestial to pixel coordinates 
and the properties of the fast focal ratio optical systems. 
Furthermore, these images may contain ~ 10^ sources, and 
pattern matching is non-trivial. 

The presented algorithm is based on, and is a general- 
ization of the above pattern matching algorithms. It is very 
fast, and works robustly for wide-field imaging with minimal 
assumptions. Namely, we assume that: i) the distortions are 
non-negligible, but small compared to the linear term, ii) 
there exists a smooth transformation between the reference 
and image points, iii) the point lists have a considerable 
number of sources in common, and iv) the transformation is 
locally invertible. 

This section has the following parts. First we describe 
symmetrical point matching in Sec. 12.5.21 before we go on 
to the discussion of finding the transformation (Sec. 12.53]) . 
The software implementation and its performance on a large 
and inhomogeneous dataset is demonstrated in Sec. 12.5.41 



2.5.2 Symmetric point matching 

First, let us assume that J-r^i is known. To find point-pairs 
between TL and T one should first transform the reference 
points to the reference frame of the image: TZ.' = pR^iiTZ). 
Now it is possible to perform a simple symmetric point 
matching between TZ' and T. One point [Ri £ TZ') from the 
first and one point (Ii £ T) from the second set are treated 
as a pair if the closest point to R\ is 7i and the closest point 
to Jl is Ri . This requirement is symmetric by definition and 



excludes such cases when e.g. the closest point to Ri is Ji, 
but there exists an R2 that is even closer to 7i, etc. 

In one dimension, finding the point of a given list near- 
est to a specific point [x) can be implemented as a binary 
search. Let us assume that the point list with A'^ points is 
ordered in ascending order. This has to be done only once, 
at the beginning, and using the quicksort algorithm, for ex- 
ample, the required time scales on average as ©(A log A). 
Then x is compared to the median of the list: if it is less than 
the median, the search can be continued recursively in the 
first N/2 points, if it is greater than the median, the second 
N/2 half is used. At the end only one comparison is needed 
to find out whether x is closer to its left or right neighbor, 
so in total 1 + logj (A) comparisons are needed, which is an 
©(log A) function of A. Thus, the total time including the 
initial sorting also goes as 0(Alog A). 

As regards a two dimensional list, let us assume again, 
that the points are ordered in ascending order by their x 
coordinates (initial sorting ~ ©(AlogA)), and they are 
spread uniformly in a square of unit area. Finding the near- 
est point in x coordinate also requires C'(logA) compar- 
isons, however, the point found presumably is not the near- 
est in Euclidean distance. The expectation value of the dis- 
tance between two points is 1/\/A, and thus we have to 
compare points within a strip with this width and unity 
height, meaning O(v'A) comparisons. Therefore, the total 
time required by a symmetric point matching between two 
catalogs in two dimensions requires ©(A^/MogA) time. 

We note that finding the closest point within a given 
set of points is also known as nearest neighbor problem (for 
a summary see ICionisI I2OO2I . and references therein) . It is 
possible to reduce the computation time in 2 dimensions 
to ©(A log A) by the aid of Voronoi diagrams and Voronoi 
cells, but we have not implemented such an algorithm in our 
matching codes. 



2.5.3 Finding the transformation 

Let us go back to finding the transformation between TZ and 
T. The first, and most crucial step of the algorithm is to 
find an initial "guess" for the transformation based 

on a variant of triangle matching. Using ^^^l^j, TZ is trans- 
formed to I, symmetric point-matching is done, and the 
paired coordinates are used to further refine the transfor- 
mation (leading to in iteration i), and increase the 
number of matched points iteratively. A major part of this 
section is devoted to finding the initial transformation. 

2.5.3.1 Tr i angle matchi ng It w as proposed earlier by 
iGrothI (|l986l '). IStetsonl (|l989D and (see lValdes eralll 19951 ) to 

use triangle matching for the initial "guess" of the transfor- 
mation. The total number of triangles that can be formed 
using A points is A(A - 1)(A - 2)/6, an O(A^) function 
of A. As this can be an overwhelming number, one can re- 
sort to using a subset of the points for the vertices of the 
triangles to be generated. One can also limit the parameters 
of the triangles, such as exclude elongated or large (small) 
triangles. 

As triangles are uniquely defined by three parameters, 
for example the length of the three sides, these parame- 
ters (or their appropriate combinations) naturally span a 3- 
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dimensional triangle space. Because our assumption is that 
J-R-ti is dominated by the linear term, to first order ap- 
proximation there is a single scalar magnification between 
TZ and T (besides the rotation, chirality and translation). 
It is possible to reduce the triangle space to a normalized, 
two-dimensional triangle space ({Tx,Ty) £ T), whereby the 
original size information is lost. Similar triangles (with or 
without taking into account a possible flip) can be repre- 
sented by the same points in this space, alleviating triangle 
matching between TZ and J. 

2.5.3.2 Triangle spaces There are multiple ways of de- 
riving normalized triangle spaces. One can define a "mixed" 
normalized triangle space where the coordinates are 

insensitive to inversion between the original coordinate lists, 
i.e. all similar triangles are r epresented by the same point 
irrespective of their chirality ijValdes et al.|[l995l ): 



n{mix) 

X 

n{mix) 



(35) 
(36) 



where a, p and q are the sides of the triangle in descending 
order. Triangles in this space are shown on the left panel 
of Fig. 1121 Coordinates in the mixed triangle space are con- 
tinuous functions of the sides (and therefore of the spatial 
coordinates of the vertices of the original triangle) but the 
orientation information is lost. Because we assumed that 
J-R^i is smooth and bijective, no local inversions and flips 
can occur. In other words, TZ and T are either fiipped or 
not with respect to each other, but chirality does not have 
a spatial dependence, and there are no "local spots" that 
are mirrored. Therefore, using mixed triangle space coor- 
dinates can yield false triangle matchings that can lead to 
an inaccurate initial transformation, or the match may even 
fail. Thus, for large sets of points and triangles it is more 
reliable to fix the orientation of the transformation. For ex- 
ample, first assume the coordinates are not flipped, perform 
a triangle match, and if this match is unsatisfactory, then 
repeat the fit with flipped triangles. 

This leads to the definition of an alternative, "chiral" 
triangle space: 

rp(chir) 



n{chir) 



h/a, 
c/a, 



(37) 
(38) 



where a, h and c are the sides in counter-clockwise order 
and a is the longest side. In this space similar triangles with 
different orientations have different coordinates. The short- 
coming of is that it is not continuous: a small pertur- 
bation of an isosceles triangle can result in a new coordinate 
that is at the upper rightmost edge of the triangle space. 

In the following, we show that it is possible to define a 
parametrization that is both continuous and preserves chi- 
rality. Flip the chiral triangle space in the right panel of 
Fig. [T2] along the + Ty = 1 line. This transformation 
moves the equilateral triangle into the origin. Following this, 
apply radial magnification of the whole space to move the 
Tx+Ty = l line to the T| + T'^ ^1 arc (the magnification 
factor is not constant: 1 along the direction of x and y-axis 
and \/2 along the Tx = Ty line). Finally, apply an azimuthal 
slew by a factor of 4 to identify the Ty = 0, > and 
Tx = Q,Ty > Q edges of the space. To be more specific, let us 
denote the sides as in y : a, b and c in counter-clockwise 



0.2 0.4 0.6 0.8 



0.2 0.4 0.6 0.8 



Figure 12. The position of triangles in the mixed and the chiral 
triangle spaces. The exact position of a given triangle is repre- 
sented by its center of gravity. Note that in the mixed triangle 
space some triangles with identical side ratios but different ori- 
entation overlap. The dashed line shows the boundaries of the 
triangle space. The dotted-dashed line represents the right trian- 
gles and separates obtuse and acute ones. 



order where a is the longest, and define 

Q = 1-b/a, (39) 

13 = 1-c/a. (40) 

Using these values, it is easy to prove that by using the 
definitions of the following variables: 

a{a + l3) 



Xl 



yi 



+ /32 ' 
/3(a + /3) 



(41) 
(42) 



X2 = x\- y\ 



(43) 

ya = 2x1 yi, (44) 
one can define the triangle space coordinates as: 
^(cont) ^ ^l-yl _ (Q + /3)(Q^-6a^/?^+/3^) 

2x2y2 _ 4(a-h/3)a/3(Q^ -/?') 



^{cont) 



(q2+/?2)2 • (46) 

The above defined X'''^""'-' continuous triangle space has 
many advantages. It is a continuous function of the sides for 
all non-singular triangles, and also preserves chirality infor- 
mation. Furthermore, it spans a larger area, and misiden- 
tification of triangles (that may be very densely packed) is 
decreased. Some triangles in this space are shown in Fig. 1131 

2.5.3.3 Optimal triangle sets As it was mentioned be- 
fore, the total number of triangles that can be formed from 
N points is « /&. Wide-field images typically contain 
0(10'') points or more, and the total number of triangles 
that can be generated - a complete triangle list - is unpracti- 
cal for the following reasons. First, storing and handling such 
a large rmmber of triangles with typical computers is incon- 
venient. To give an example, a full triangulation of 10,000 
points yields ~ 1.7 x lO'^^ triangles. 

Second, this complete triangle list includes many trian- 
gles that are not optimal to use. For example large triangles 
can be significantly distorted in T with respect to TZ, and 
thus are represented by substantially different coordinates 
in the triangle space. The size of optimal triangles is gov- 
erned by two factors: the distortion of large triangles, and 
the uncertainty of triangle parameters for small triangles 
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Figure 13. Triangles in the continuous triangle space as defined 
by Eqs. I45H46I We show the same triangles as earlier, in Fig. 1121 
for the rt™'") and r('=hir)^^i3^j^gig 

spaces. Equilateral triangles 
are centered in the origin. The dotted-dashed line refers to the 
right triangles, and divides the space to acute (inside) and obtuse 
(outside) triangles. Isosceles triangles arc placed on the x-axis 
(where r^"°"') = 0). 

that are comparable in size to the astrometric errors of the 
vertices. 

To make an estimate of the optimal size for triangles, 
let us denote the characteristic size of the image by D, the 
astrometric error by S, and the size of a selected triangle 
as L. For the sake of simplicity, let us ignore the distor- 
tion effects of a complex optical assembly, and estimate 
the distortion factor fd in a wide field imager as the dif- 
fere nce between the orth ographic and gnomonic projections 
(see ICalabretta fc Greisen,2002 ) : 



fd ~ |(sin(d) -tan(d))/d| « |1 -cos(d)i , 



(47) 



where d is the radial distance as measured from the cen- 
ter of the field. For the HATNet frames {d ^ D ^ 6° to 
the corners) this estimate yields fd ~ 0.005. The distortion 
effects yield an error of fdL/D in the triangle space - the 
bigger the triangle, the more significant the distortion. For 
the same triangle, astrometric errors cause an uncertainty 
of S/L in the triangle space that decreases with increasing 
L. Making the two errors equal, 

fd-L ^ I 
D V 

an optimal triangle size can be estimated by 



(48) 



5-D 



(49) 



In our case d = 2048 pixels (or 6°), fd ~ 0.005 and the 
centroid uncertainty for an / = 11 star is 5 = 0.01, so the 
optimal size of the triangles is Lopt ~ 60 — 70 pixels. 

Third, dealing with many triangles may result in a tri- 
angle space that is over-saturated by the large number of 




Figure 14. Triangulations of some randomly distributed points: 
the left panel shows the Delaunay triangulation (60 triangles in 
total) the right panel exhibits the £ = I extended triangulation 
(312 triangles) of the same point set. 



points, and may yield unexpected matchings of triangles. In 
all definitions of the previous subsection, the area of the tri- 
angle space is approximately unity. Having triangles with an 
error of a in triangle space and assuming them to have a uni- 
form distribution, allowing a 3cr spacing between them, and 
assuming a = 5 /Lopt, the number of triangles is delimited 
to: 



D 



(3a)2 '~ 9 V 5 y 9fd5 ■ 

In our case (see values of D, fd and 5 above) the former 
equation yields Topt « 2 x lO" triangles. Note that this is 5 
orders of magnitude smaller than a complete triangulation 
(0(10")). 

2.5.3.4 The extended Delaunay tri angulation De- 
launay triangulation (see IShewchu3ll996h is a fast and ro- 
bust way of generating a triangle mesh on a point-set. The 
Delaunay triangles are disjoint triangles where the circum- 
circle of any triangle contains no other points from any other 
triangle. This is also equivalent to the most efficient exclu- 
sion of distorted triangles in a local triangulation. For a vi- 
sual example of a Delaunay triangulation of a random set of 
points, see the left panel of Fig. 1141 

Following Euler's theorem (also known as the polyhe- 
dron formula), one can calculate the number of triangles in 
a Delaunay triangulation of A'' points: 



Td = 2N -2~C, 



(51) 



where C is the number of edges on the convex hull of the 
point set. For large values of N, Td can be estimated as 
2^, as 2-1- C is negligible. Therefore, if we select a subset of 
points (from TZ or I) where neighboring ones have a distance 
of Lopt , we get a Delaunay triangulation with approximately 
2DV-t'opt triangles. The D, 5 and fd values for HAT images 
correspond to ~ 6000 triangles, i.e. 3000 points. In our ex- 
perience, this yields very fast matching, but it is not robust 
enough for general use, because of the following reasons. 

Delaunay triangulation is very sensitive for removing a 
point from the star list. According to the polyhedron for- 
mula, on the average, each point has 6 neighboring points 
and belongs to 6 triangles. Because of observational effects 
or unexpected events, the number of points fiuctuates in the 
list. To mention a few examples, it is customary to build up 
I from the brightest stars in an image, but stars may get 
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saturated or fall on bad columns, and thus disappear from 
the list. Star detection algorithms may find sources depend- 
ing on the changing full-width at half maximum (FWHM) 
of the frames. Transients, variable stars or minor planets 
can lead to additional sources on occasions. In general, if 
one point is removed, 6 Delaunay triangles are destroyed 
and 4 new ones are formed that are totally disjoint from the 
6 original ones (and therefore they are represented by sub- 
stantially different points in the triangle space). Removing 
one third of the generating points might completely change 
the triangulation^^ . 

Second, and more important, there is no guarantee that 
the spatial density of points in TZ and X is similar. For ex- 
ample, the reference catalog is retrieved for stars with differ- 
ent magnitude limits than those found on the image. If the 
number of points in common in TZ and T is only a small frac- 
tion of the total number of points, the triangulation on the 
reference and image has no common triangles. Third, the 
number of the triangles with Delaunay triangulation (To) 
is definitely smaller than Topt ; i.e. the triangle space could 
support more triangles without much confusion. 

Therefore, it is beneficial to extend the Delaunay trian- 
gulation. A natural way of extension can be made as follows. 
Define a level I and for any given point (P) select all points 
from the point set of A'^ points that can be connected to P 
via maximum I edges of the Delaunay triangulation. Follow- 
ing this, one can generate the full triangulation of this set 
and append the new triangles to the whole triangle set. This 
procedure can be repeated for all points in the point set at 
fixed i. For self-consistence, the I = Q case is defined as the 
Delaunay triangulation itself. If all points have 6 neighbors, 
the number of "extended" triangles per data point is: 

Tt = {■if + M + 1){M^ + M){M'^ + M- 1)/6 (52) 

for £ > 0, i.e. this extension introduces 0(^®) new trian- 
gles. Because some of the extended triangles are repetitions 
of other triangles from the original Delaunay triangulation 
and from the extensions of another points, the final depen- 
dence only goes as 0{Tr>£^)- We note that our software 
implementation is slightly different, and the expansion re- 
quires 0{N£^) time and automatically results in a triangle 
set where each triangle is unique. To give an example, for 
A'^ = 10, 000 points the Delaunay triangulation gives 20, 000 
triangles, the £ = 1 extended triangulation gives ~ 115,000 
triangles, £ = 2 some ~ 347, 000 triangles, £ = 3 875, 000 
and ^ = 4 ~ 1, 841, 000 triangles, respectively. The extended 
triangulation is not only advantageous because of more tri- 
angles, and better chance for matching, but also, there is a 
bigger variety in size that enhances matching if the input 
and reference lists have different spatial density. 

2.5.3.5 Matching the triangles in triangle space If 

the triangle sets for both the reference and the input list are 
known, the triangles can be matched in the normalized tri- 
angle space (where they are represented by two dimensional 
points) using the symmetric point matching as described in 
Sec. [23:21 



In the next step we create a Nr x Ni "vote" matrix 
V, where Nr and Nj are the number of points in the ref- 
erence and input lists that were used to generate the tri- 
angulations, respectively. The elements of this matrix have 
an initial value of 0. Each matched triangle corresponds to 
3 points in the reference list (identified by ri, r2, rs) and 3 
points in the input list (ii , 22 and 13). Knowing these indices, 
the matrix elements Vr^ii, K-2i2 Vr^irf are incremented. 
The magnitude of this increment (the vote) can depend on 
the distances of the matching triangles in the triangle space: 
the closer they are, the higher votes these points get. In 
our implementation, if Nt triangles are matched in total, 
the closest pair gets Nt votes, the second closest pair gets 
Nt — 1 votes, and so on. 

Having built up the vote matrix, we select the greatest 
elements of this matrix, and the appropriate points referring 
to these row and column indices are considered as matched 
sources. We note that not all of the positive matrix elements 
are selected, because elements with smaller votes are likely 
to be due to misidentifications. We found that in practice 
the upper 40% of the matrix elements yield a robust match. 

2.5.3.6 The unitarity of the transformations If 

an initial set of the possible point-pairs are known from 
triangle-matching, one can fit a smooth function (e.g. a poly- 
nomial) that transforms the reference set to the input points. 
Our assumption was that the dominant term in our transfor- 
mation is the similarity transformation, which implies that 
the homogeneous linear part of it should be almost unitarity 
operator'^^. After the transformation is determined, it is use- 
ful to measure how much we diverge from this assumption. 
As mentioned earlier (Sec. l2.5TT|) . similarity transformations 
can be written as 



XAr + b 



(53) 



where A 7^ 0, and the a,b,c,d matrix components are the 
sine and cosine of a given rotational angle, i.e. a — d and 

b = -c. 

If we separate the homogeneous linear part of the trans- 
formation, as described by a matrix similar to that in equa- 
tion (|53|) . it is a combination of rotation and dilation with 
possible inversion if \a\ « |d| and jc| « We can define the 
unitarity of a matrix as: 

{a=fdf + (b±cf 



T?+b'^+C^+ d2 



(54) 



where the ± indicates the definition for regular and inverting 
transformations, respectively. For a combination of rotation 
and dilation, A is zero, for a distorted transformation A ~ 
fd « 1. 

The A unitarity gives a good measure of how well the 
initial transformation was determined. It happens occasion- 
ally that the transformation is erroneous, and in our expe- 
rience, in these cases A is not just larger than the expec- 
tational value of fd, but it is ~ 1. This enables fine-tuning 



Imagine a honey-bee cell structure where all central points 
of the hexagons are added or removed: these two construction 
generates disjoint Delaunay triangulations. 



Here AA+ = 7, where A+ is the adjoint of A and I is the 
identity, i.e. A is an orthogonal transformation with possible in- 
version and magnification. 
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of the algorithm, such as changing chirahty of the trian- 
gle space, or adding further iterations till satisfactory A is 
reached. 

2.5.3.7 Point matching in practice In practice, 
matching points between the TZ reference and T image goes 
as the following: 

(i) Generate two triangle sets Ta and T/ on TZ and T, 
respectively: 

(a) In the first iteration, generate only Delaunay trian- 
gles. 

(b) Later, if necessary, extended triangulation can be 
generated with increasing levels of I. 

(ii) Match these two triangle sets in the triangle space 
using symmetric point matching. 

(iii) Select some possible point-pairs using a vote- 
algorithm (yielding A^o pairs). 

(iv) Derive the initial smooth transformation J-^i^l^j using 
a least- squares fit. 

(a) Check the unitarity of J-^^j. 

(b) If it is greater than a given threshold {0{fd)), in- 
crease £ and go to step (i)/(b). If the unitarity is less than 
this threshold, proceed to step 5. 

(c) If we reached the maximal allowed £, try the proce- 
dure with triangles that are flipped with respect to each 
other between the image and reference, i.e. switch chiral- 
ity of the T*'^"'"'' triangle space. 

(v) Transform TZ using this initial transformation to the 
reference frame of the image {TZ' — J-'j^l^j(JZ)). 

(vi) Perform a symmetric point matching between TZ' and 
T (yielding A^i > A^o pairs). 

(vii) Refine the transformation based on the greater num- 
ber of pairs, yielding transformation J^^l^^, where i is the 
iteration number. 

(viii) If necessary, repeat points 5, 6 and 7 iteratively, in- 
crease the number of matched points, and refine the trans- 
formation. 

For most astrometric transformations and distortions it 
holds that locally they can be approximated with a simi- 
larity transformation. At a reasonable density of points on 
I and I, the triangles generated by a (possibly extended) 
Delaunay triangulation are small enough not to be affected 
by the distortions. The crucial step is the initial triangle 
matching, and due to the use of local triangles, it proves to 
be robust procedure. It should be emphasized that ^'j^l^j 
can be any smooth transformation, for example an afBne 
transformation with small shear, or polynomial transforma- 
tion of any reasonable order. The optimal value of the order 
depends on the magnitude of the distortion. The detailed de- 
scription of fitting such models and functi ons can be found i n 
various textbooks (see e.g. Chapter 15. in lPress et al.|[l993 ). 
It is noteworthy that in step 7 one can perform a weighted 
fit with possible iterative rejection of n-a outlier points. 



as a part of the complete data reduction package. The pro- 
gram named grmatch (Sec. I2.12.10|l matches point sets, in- 
cluding triangle space generation, triangle matching, sym- 
metric point matching and polynomial fitting, that is steps 
1 through 4 in Sec. 12.5.3.71 The other program, grtrans 
(Sec. I2.12.9|) . transforms coordinate lists using the transfor- 
mation coefflcients that are output by grmatch. The grtrans 
code is also capable of fitting a general polynomial transfor- 
mation between point-pair lists if they are paired or matched 
manually or by an external software. We should note that in 
the case of degeneracy, e.g. when all points are on a perfect 
lattice, the match fails. 

By combining grmatch and grtrans, one can easily de- 
rive the World Coordinate System (WCS) information for a 
FITS data file. Output of WCS keywords is now fully imple- 
mented in grtrans, following the conventions of the package 
wcstools^' (sec Mink 20021'). Such information is very useful 
for manual analysis with well-known FITS viewers (e.g. ds9, 
see I Jove fc Mandeill2003l). For a more detailed description of 
WCS see lCalabretta fc Greiseij joOO^ and on the represen- 
tation of distortions see IShupel (|2005l ). 



2.6 Registering images 

In order to have data ready for image subtraction, the im- 
ages themselves have to be transformed to the same ref- 
erence system (i.e. the images have to be registered). This 
transformation is a continuous mapping between the refer- 
ence coordinate system and the system of each of the in- 
dividual images. In practice, all of the frames are taken by 
the same instrument so this transformation is always nearly 
identity, affected only by slight rotation, shift and small dis- 
tortions (for instance due to differential refraction as a given 
field of the sky is observed at different air masses or small 
dilations may occur due to the change of focus). In princi- 
ple, the whole registration process should comply with the 
following issues. First, the relative transformations are ex- 
pected to be as small as possible. Since dilations are negligi- 
ble, the combination of rotation and shift can be described 
by an affine linear transformation whos determinant is 1. 
However, the distortions resulted by differential refraction 
require higher order transformations to be described prop- 
erly. Second, the flux transformation must preserve bright- 
nesses of the sources. Namely, any area on the image refer- 
enced by the same absolute (e.g. celestial) coordinates must 
contain exactly the same amount of flux before and after 
the transformation is done. Third, composition of the geo- 
metric transformations must be as "commutative" as pos- 
sible with subsequent image transformations. Namely, hav- 
ing two — > mappings, e.g. / and g, and we denote 
the transformed version of image / by Tf[I], we want to 
keep ||Tyog[7] — T/[Tg[7]]|| as small as possible. Here "small" 
means that the difference between the images Tfog[I] and 
Tf[rg[7]] should be comparable with the overall noise level. 
In this section the details of this image registration process 
is discussed. 



2.5.4 Implementation 

The coordinate matching and coordinate transforming algo- 
rithms are implemented in two stand-alone binary programs 



http: / /tdc-www. harvard.edu/wcstools/ 
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Figure 15. Astromctric residuals as the function of observational conditions. The left panel shows the strong correlation between the 
stellar profile sizes (FWHM ^ 2.35/ \/S): the sharper the stars arc on the image the smaller the astrometric residual is. The middle panel 
shows the astrometric residuals as the elevation of the Moon. Obviously, if the Moon is below the horizon, the residuals are independent 
from this "negative elevation" , however, if the Moon is above the horizon, the effect of the stronger background illumination can be seen 
well: as the elevation of the Moon increases, the residuals do also become larger. The right panel shows the residuals as the function of 
the field elevation. No correlation between these can be seen. 



2.6.1 Choosing a reference image 

In practice, the reference image is chosen to be a "nice" 
image, with high signal-to-noise ratio and therefore with 
small astrometric residual. Since the signal-to-noise ratio is 
affected by both the background noise and the fluxes of the 
individual stars, images taken near culmination, after as- 
tronomic twilight and when the Moon is below the horizon 
are a proper choice in most of the cases. Moreover, in the 
case of HATNet, sharper images tend to have smaller astro- 
metric residuals because of the merging of nearby stars is 
also smaller and the background noise affects less pixels. In 
the panels of Fig. 1151 the astrometric residuals are shown as 
the function of the previously discussed observational con- 
ditions. As one can expect, the effect of the image sharpness 
(characterized by the stellar profile FWHMs) and the Moon 
elevation definitely infiuence the astrometric residuals. How- 
ever, the effect of the field elevation itself is negligible, the 
variation in the airmass between ~ 1.02 and ~ 2.55 (i.e. 
12° ^ z < 67°) causes no practical fluctuation in the astro- 
metric residuals. 

We should note here that the whole process of the image 
subtraction photometry needs not only a specific image to 
be an astrometric reference but a couple of images for pho- 
tometric reference as well. As we will see later on, the selec- 
tion criteria for convolution reference images are roughly the 
same as for an astrometric reference. Hence, in practice, the 
astrometric reference image is always one of the convolution 
reference frames. 



2.6.2 Relative transformations 

Once the reference frame for registration has been cho- 
sen, the appropriate geometric transformations between this 
frame and the other frames should be derived (prior to the 
image transformation itself). To derive this geometric trans- 
formation, one can proceed using one of the following meth- 
ods: 



• Assuming the absolute astrometrical solutions to be 
known (i.e. the mappings between the celestial and pixel 
coordinates), the solution for the reference frame can be 
composed with the inverse of the solution for the current 
image. 

• Assuming that the sources on both the reference and 
the current images are extracted and identified with a pre- 
viously declared external catalog, one can match these iden- 
tifier - pixel coordinate lists and fit a geometric transforma- 
tion involving the matched coordinate pairs. 

• If any kind of astrometric information - neither ab- 
solute solution nor source identification - is not known in 
advance, one can directly employ the triangulation-based 
point matching algorithm itself, as it was presented earlier 
(Sec.[23J. 



In practice, the first option is sub-optimal. Since the abso- 
lute astrometric transformation has higher order distortions 
than in a relative transformation, such composition does eas- 
ily lead to numeric round-off errors. Moreover, the direct 
composition of two polynomials with an order of 6 (which 
is needed for a proper astrometric solution, see Sec. 12.1.31 
Table[T]or Sec. [23} yields a polynomial with an order of 12, 
while a relative transformation between two images needs 
only a polynomial with a degree of 3 — 4 (see also Table [1]) . 
The naive omission of higher order polynomial coefficients 
does not result the "best fit" and this best fit depends on the 
domain of the polynomial therefore this polynomial degra- 
dation is always an ambiguous step. 

Both the second or third option mentioned above are ef- 
ficient and can be used in practice. The last option, involving 
the point matching to determine the relative transformation 
has an advantage: on cloudy images where derivation of the 
absolute astrometric solution failed, the chance to obtain a 
successful relative transformation is higher. This is mostly 
because of both the lack of large-scale distortions and the 
smaller polynomial degree required for such transformations. 
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Figure 16. In order to perform a spatial image transformation 
with exact flux conserving, the intensity level of the original image 
should be integrated on the quadrilaterals defined by the mapping 
function. Each quadrilateral is the projection of one of the pixels 
in the target image while the dots represent the projections of 
the pixel corners. The above image shows the pixel grid of the 
original image and the grid of quadrilaterals for a transformation 
that shrinks the image by a factor of nearly two. 



2.6.3 Conserving flux 

Even if the spatial image transformation does not signifi- 
cantly shrink or enlarge the image, pixels of the target image 
usually are not mapped exactly to the pixels of the original 
image (and vice versa). Therefore, some sort of interpola- 
tion is needed between the adjacent pixel values in order to 
obtain an appropriate transformed image. Since the spatial 
transformation is followed by the steps of convolution and 
photometry, exact flux conservation is a crucial issue. If the 
interpolation is performed naively by multiplying the inter- 
polated pixel values with the Jacobian determinant of the 
spatial mapping, the exact flux conservation property is not 
guaranteed at all. It is even more relevant in the cases where 
the transformation includes definite dilation or shrinking, 
i.e. the Jacobian determinant significantly differ from unity. 

In order to overcome the problem of the fiux conserving 
transformations, we have implemented a method based on 
analytical integration of surfaces of which are determined by 
the pixel values. These surfaces are then integrated on the 
quadrilaterals whose coordinates are derived by mapping the 
pixel coordinates on the target frame to the system of the 
original frame. An example is shown in Fig. 1161 where the 
transformation includes a shrink factor of nearly two (thus 
the Jacobian determinant is ~ 1/4). In practice, two kind of 
surfaces are used in the original image. The simplest kind 
of surface is the two dimensional step function, defined ex- 
plicitly by the discrete pixel values. Obviously, if the area of 
the intersections of the quadrilaterals and the pixel squares 
is derived, the integration is straightforward: it is equivalent 
with a multiplication of this intersection area by the actual 
pixel value. 

A more sophisticated interpolation surface can be de- 
fined as follows. On each pixel, at the position (i,j), we 
define a biquadratical function of the fractional pixel coor- 
dinates {5x,5y), namely 



For each pixel, we define nine coefficients, C]^^. We derive 
these coefficients by both constraining the integral of the 
surface at the pixel to be equal to the pixel value itself, i.e. 



1 1 



[5x, Sy) dSx dSy = Pij 



(56) 



and requiring the joint function F{x,y) describing the sur- 
face 



f(x,y)=/W'^l(M,M) 



(57) 



to be continuous (here [x\ denotes the integer part of x and 
{x} denotes the fractional part, i.e. x = [x\ + {x}). This 
continuity is equivalent to 

/''+''^(0,y) = ,r(l,j/), 



/ 



(a;,0) 



r(0,y), 

r(^,i), 



r-'\x,i) = ,r(a;,o), 



(58) 
(59) 
(60) 
(61) 



for all ^ 2/ ^ 1. Since / is a biquadratical function of the 
fractional pixel coordinates {x,y), it can be shown that the 
above four equations imply 8 additional constraints for each 
pixel. At the boundaries of the image, we can define any fea- 
sible boundary condition. For instance, by fixing the partial 
derivatives dF/dx and dF/dy of the surface F{x,y) to be 
zero at the left/right and the lower/upper edge of the image, 
respectively. It can be shown that the integral property of 
equation (|56|) . the continuity constrained by equations (|58|) - 
(|61|l and the boundary conditions define an unique solution 
for the C^-^ coefficients. This solution exists for arbitrary val- 
ues of the Pij pixel intensities (note that the complete prob- 
lem of obtaining the C]^^ is a system of linear equations). 
Since the integrals of the F{x, y) surface on the quadrilater- 
als are linear combinations of polynomial integrals, the pixel 
intensities on interpolated images can be obtained easily, al- 
though it is a bit more computationally expensive. 

We should note here that if the transformation is a sim- 
ple shift (i.e. there are not any dilation, rotation and higher 
order distortions at all), the two, previously discussed inter- 
polation sche mes yield the sam e results as the classic bilinear 
and bicubic (|Press et al.ll 199^ 1 interpolation. 

In practice, during the above interpolation procedure 



nored from the determination of the C^-^ coefficients, 



and 



any interpolated pixels on the target image inherit the un- 
derlying masks of the pixels that intersects their respective 
quadrilaterals. Pixels on the target frame that are mapped 
off the original image have a special mask which marks them 
"outer" ones (see also Sec. l2.3.2ll . It yields a transparent pro- 
cessing of the images: for instance in the case of photometry, 
if the aperture falls completely inside the image but inter- 
sects one or more pixels having this "outer" mask yields the 
same photometry quality flag as if the aperture is (partially 



For instance, pixels that are saturated or have any other un- 
desired mask. 
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or completely) off the image. See also Sec. l2.7l or Sec l2.12.13l 
for additional details. 



2.6.4 Implementation 

The core algorithms of the interpolations discussed here are 
implemented in the program fitrans (Sec. 12. 12. lip . This 
program performs the spatial image transformation, involv- 
ing both the naive and the integration-based methods and 
both the bilinear and bicubic/biquadratical interpolations. 
The transformation itself is the output of the grmatch or 
grtrans programs (see also Sec. I2.12.10l and Sec. 12.12^ . 

2.7 Photometry 

The main step in a reduction pipeline intended to measure 
fluxes of objects on the sky is the photometry. All of the 
steps discussed before are crucial to prepare the image to 
be ready for photometry. Thus at this stage we should have 
a properly calibrated and registered^® image as well as we 
have to know the positions of the sources of interest. For 
each source, the CCD photometry process for a single im- 
age yields only raw instrumental fluxes. In order to estimate 
the intrinsic flux of a target object, ground-based observa- 
tions use nearby comparison objects with known fluxes. The 
difference in the raw instrumental fluxes between the target 
source and the source with known flux is then converted 
involving smooth transformations to obtain the ratios be- 
tween the intrinsic flux values. Such smooth transformation 
might be the identical transformation (this is the simplest 
of all photometry methods, known as single star comparison 
photometry) or some higher order transformations for cor- 
recting various gradients (mostly in the transparency; due 
to the large field of view, the airmass and therefore the ex- 
tinction at the different corners of the image might signifi- 
cantly differ). Even more sophisticated transformations can 
also be performed in order to correct additional filter- and 
instrumentation effects yielded by the intrinsic color (and 
color differences) between the various sources. Corrections 
can also made in order to transform the brightnesses into 
standard photometric systems. The latter is known as stan- 
dard transformation and almost in all cases it requir es mea- 
surements for standard areas as weU l|Landoltlll99"3 ). Since 
for all objects, transparency variations cause flux increase or 
decrease proportional to the intrinsic flux itself, the trans- 
formations mentioned above are done on a logarithmic scale 
(in practice, magnitude scale). For instance, in the case of 
single-star comparison photometry, the difference between 
the intrinsic magnitudes and the raw instrumental magni- 
tudes is constant^'^. In this section some aspects of the raw 
and instrumental photometric methods are detailed with the 
exception of topics related to the photometry on convolved 
and/or subtracted images. As it was mentioned above, the 
first step of the photometry is the derivation of the raw 



Only if wc intend to perform image subtraction based pho- 
tometry. 

To be precise, only if the spectra of the two stars are exactly 
the same and the two objects are close enough to neglect the 
difference in the atmospheric transparency. 



instrumental magnitudes of the objects or sources of our 
interest. 



2. 7. 1 Raw instrumental magnitudes 

In principle, raw magnitudes are derived from two quanti- 
ties. First, the total flux of the CCD pixels are determined 
around the object centroid. The total flux can be determined 
in three manners; 

• If a region is assigned to the object of interest, one has 
to count the total flux of the pixels inside this region. The 
region is generally defined to be within a fixed distance from 
the centroid (so-called aperture) , but in the case of diffuse or 
non-point sources, more sophisticated methods have to be 
used to define the boundary of the regio n. The algorithms 
implemented in the program SExtractor (|Bertin fc Arnouta 
Il996l ) focus on photometry of such sources. In the following 
we are interested only in stars and/or point-like sources. 

• If the source profile can be modelled with some kind 
of analytic function (see Sec I2.4.2p or an empirical model 
function (e.g. the PSF of the image), one can fit such a 
model surface to the pixels that are supposed to belong to 
the object (e.g. to the pixels being inside of a previously 
defined aperture or one of the isophotes). From the fitted 
parameters, the integral of the surface is derived, and this 
integral is then treated as the flux of the object. This method 
for photometry is known as PSF photometry. 

• The previous two methods can be combined as follows. 
After fitting the model function, the best fit surface is sub- 
tracted from the pixel values and aperture photometry is 
performed on this residual. The fiux derived from the resid- 
ual photometry is then added to the fiux derived from the 
best fit surface parameters yielding the total flux for the 
given object. It is not necessary that the pixels used for 
surface fitting are the same as the pixels being inside the 
aperture. 

It should be mentioned here that whatever primary method 
from these above is used to perform the photometry, esti- 
mating the uncertainties should be done carefully. 

After the total fiux of the object has been estimated, 
one has to remove the flux contribution of the background. 
It is essential in the case of aperture photometry, however, 
if a profile function is fitted to the pixel values, the contri- 
bution of the background is added to the model function 
as an additional free parameter. If the photometric aper- 
ture is a circular region, the background is usually defined 
as a concentric annulus, whose inner radius is larger than 
the radius of the aperture. If the field is not crowded, the 
background level is simply the mean or median of the pixel 
values found in the annulus. On the other hand, if the field 
is extremely crowded, the determination of the background 
level might even be impossible. A solution for this issue can 
be either profile (PSF) fitting or photometry based on dif- 
ferential images (see Sec. l2.9|l . Note that on highly crowded 
fields, apertures significantly overlap. One advantage of the 
profile/PSF model fitting method is the ability to fit adja- 
cent profiles simultaneously. 

In practice, additional data are obtained and reported 
for a single raw instrumental photometry measurement, such 
as; 



© 0000 RAS, MNRAS 000, 000-000 



Tools for discovering and characterizing extrasolar planets 25 



10 





















































9 










































8 










i 






























0.001 i 0.228 


0.314 


0.054 














7 




















































Q<€l4: 1 


1 


0.894~ 


SD.120 











6 

















































0.157 


1 .000 i 1 


1 


1 


0.567 




























0.19B 


1 1 


• 

1 


1 


0.599 











4 














































A 

u 


U.UU/ 


6L771 1 


■| 


971 




A 

u 


u 


u 


3 




















































0.043i^TtS*- 




l5!l77 














2 










































1 










i 






























i 











































D 






2 3 


4 


5 




7 


3 


3 



10 



Figure 17. Weight matrix for a circular aperture centered at 
(a;o,yo) = (4.2,4.9) and having a radius of ro = 2.45 pixels. The 
numbers written in the squares show the area of the intersection 
of the given square and the circle. 

• Noise estimations, based on the Poisson statistics of the 
fiux values, the uncertainty of the background level deter- 
minatio n, and optio nally scintillation noise can also be esti- 
mated (lYoundl 19671 '): 

• Characteristics of the background: total number of pix- 
els used to derive the background level, number of outlier 
pixels - such as pixels of nearby stars or cosmics events - 
rejected from the background determination procedure and 
so. 

• Quality flags, such as various pixel masks happen to fall 
in the aperture. 



2.7.2 Formalism for the aperture photometry 

In practice, aperture photometry derives the raw instrumen- 
tal magnitudes as follows. Let us consider an image I with 
the pixel intensities I{x, y) = I^y where (x, y) are the re- 
spective pixel coordinates. Let us define the weight matrix 
for the circular aperture centered at (2:0,2/0) and having a 
radius of ro as 

a;o+i yo+i 

= f '^^ f '^y ^ [^'a ~ ~ ^of - (y ~ yof] ' 

^0 yo 

where B(-) is the Heaviside step function (see also Fig. I17p . 
Due to the definition of A^y, it is unity inside the aperture, 
has some value between and 1 at the boundary (depending 
on the area of overlap), and it is zero further outside from 
the aperture centroid. The total raw instrumental flux /total 
is then simply derived as 

/total = Axylxy. (63) 



The background level in the annulus having inner and outer 
radii of ri and r2 respectively, around the centroid {xo,yo) 
can be derived as 

Er ( A^a,yo,r2 _ Axo,yo,ri\ 
J-xy \-^xy -^xy } 

B = — 2 -2 ■ (64) 

n-n 

The raw flux of the object in the aperture after the back- 
ground level removal is 

x.y 

Albeit this discussion seems to be rather trivial, the same 
formalism will be used later on in Sec. 12. 91 while considering 
the details of photometry performed on subtracted images. 

2.7.3 Magnitude transformations 

As it was mentioned earlier, raw magnitude lists on subse- 
quent frames yielded by the photometry have to be trans- 
formed to the same reference system in order to have in- 
strumental and/or standard magnitudes for our objects. For 
a given frame, let us assume to have a list of stars with 
m'*^ raw magnitudes, located at the {xi,yi) position on the 
image. Let us denote the raw magnitudes of these objects 
on a certain reference frame by ttiq'^ For images obtained 
by small field-of-view instrumentation, the m'-'' — ttIq'^ dif- 
ference depends only on the color of the star, due to the 
wavelength dependence of the atmospheric extinction. For 
images obtained by larger field-of-view optics, the difference 
between the instrumental magnitudes depend also on the 
{x,y) centroid positions due to the gradient in the extinc- 
tion level throughout the image. In practice, both the spatial 
and color dependence of the differential magnitudes can be 
well characterized by polynomials. Such a transformation is 
quantified as 

^W_^W^^ ^ A-.«4V), (66) 

where C'*^ is some color index (e.g. V — I or J — K) of 
the star, and A'^ and Nc are the maximal polynomial or- 
ders in the color and in the spatial coordinates, respectively. 
The Kcki coefficients can be obtained by involving the linear 
least squares method, if each of the stars are weighted ap- 
propriately. The weights assigned to the stars can be derived 
from both the photon noise and the Ught curve residuals. In 
practice, the above mentioned magnitude transformation is 
done iteratively. First, instrumental magnitude lists for each 
frame are transformed to the instrumental system of one of 
the frames. This reference frame is usually selected from the 
"best" frames, i.e. that has been obtained at low airmass 
and good generic atmospheric conditions, has small astro- 
metric residuals and the illumination of the Moon and/or 
sky background (due to twilight) is the smallest. After each 
magnitude list have been transformed, light curves are gath- 
ered and the individual scatters are derived for each star. 
The transformation is then repeated while the contribution 
of each star is weighted by the hght curve scatters. This 
kind of weighting gives lower weight for stars whose scat- 
ter have been underestimated (due to unresolved remaining 
systematics, for instance) or have intrinsic but not known 
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variability. Of course, stars with known variability should 
be excluded from the fit, including our target stars as well. 



2.7.4 Implementation 

In the fi/fihat package, the above discussed photometric al- 
gorithms are implemented as follows. The aperture photom- 
etry and the related features - such as background level de- 
termination, noise estimations, assignment of quality flags, 
conversion of fluxes to instrumental magnitudes - are im- 
plemented in the program fiphot (see Sec. I2.12.13|l . The 
point-spread functions are derived by the program f istar 
(Sec. I2.12.8|l . Moreover, this program is also capable to fit 
the derived PSFs to the individual detected profiles. Cur- 
rently, none of these programs deals explicitly with profile 
fit residuals, however, the output of f istar can be used as 
an input for f irandom (both for analytical profile models and 
PSFs) to create model images. Such model images are suit- 
able to subtract from the original images yielding complete 
residual images. The program If it is another alternative for 
fitting analytic stellar profile models that are not supported 
by fistar/f irandom. Magnitude transformations between 
two frames can also be performed with the program If it. 
See also Sec. 12.131 and Fig. [29] about the practical details 
about how these programs can be applied for real observa- 
tions. 



2.8 Image convolution and subtraction 

In a generic variability survey, such as the HATNet project, 
we are primarily focusing on the detection and the quan- 
tifications of source brightness variations. The idea behind 
the photometry methods involving image subtraction is to 
derive the part of the flux that varies from image to image. 
It is rather easy to see that simple per-pixel arithmetic sub- 
traction is not sufficient to derive the difference between two 
images. First, the centroid positions of the stars are differ- 
ent for each image. The magnitude of this difference depends 
on the precision and the systematic variations in the mount 
tracking, as well as other side effects such as field rotation 
and the intrinsic differential refraction. However, it is rather 
easy to overcome this problem by registering the images to 
the same reference system (Sec. 12. 6|) . Second, background 
level may vary from image to image. Changes in the back- 
ground can be modelled by adding a constant or some slowly 
varying function to the (convolved) image. Third, the stel- 
lar profiles are also vary from frame to frame, due to the 
variations in the seeing or in the focus. In order to have 
the smallest residual between two images, one should not 
only register these to the same reference system but on at 
least one of the images, the profiles should be transformed 
to match the profiles of the other image. This profile trans- 
formation is performed as a convolution, namely the image 
R is transformed to B! as 



R' ^ B + R*K, 



(67) 



where K is the convolution kernel and the operator (•) * (•) 
denotes the convolution. For (astronomical) images that are 
sampled on discrete pixels, the operation of convolution is 
defined as 



E 



R, 



[x-i)(y-i)Ki3 



(68) 



Here, the convolution kernel Kik is sampled on a grid of 
(2_Bk + 1) X (2_Bk + 1) pixels and I^y refers to the intensity 
of the pixel at {x,y). If the difference of FWHMs of the 
image R and R' are small, the kernel can be sampled on a 
smaller grid. In general, a kernel function with an FWHM 
of Fk yields a profile FWHM F' on the convolved image of 



F' : 



F-' + Fl, 



(69) 



where F is the FWHM of the profiles on the image R. 

Supposing two images, / and R, the main problem of 
the image convolution and subtraction method is to find the 
appropriate kernel K with which the image R convolved, 
the resulting image is nearly ide ntical to I. Th e first at- 
tempt to find this optimal kernel (jTomanev fc Crotts 19961 ) 
was based on an inverse Fourier transformation between 
the two PSFs of the images. Theoretically, inverse Fourier 
transformation yields the appropriate kernel, however, the 
practical usage of this method is limited due to the high 
signal-to-noise ratio that is n eeded by a Fourier inversion. 
iKochanski. Tvson fc Fischeij (|l996l ) attempted to find the 
kernel K by minimizing the merit function 



(70) 



This minimization yields a non-linear equation for the ker- 
nel K and therefore it is not computationally efficient. The 
m ost cited algorithm relat ed to image subtraction was given 
bv lAlard fc LuptonI (| 19981 ). In this work, an additional term 
was added to the convolution transformation, which allows 
to fit not only the convolution transformation but the back- 
ground variations: 



I = B + R*K. 



(71) 



The basic idea of lAlard fc Luptm] (|l998l ) was to minimize 
the function 



(72) 



and search the kernel solution K in the form of 
K^Y^ax'-'K (73) 

i 

In their work, the kernels K'' were two dimensional Gaus- 
sian functions with variable FWHMs multiplied by polyno- 
mials. Assuming the background variations to be constant, 
i.e. Bxy = B, minimizing equation (|72p yields a linear set of 
equations for the parameters B and d, thus its solution is 
straightforward (and efficient). Shortly after, lAlardI (|2000l ) 
gave a more sophisticated method that allows the kernel pa- 
rameters as well as the background level to vary across the 
image: 



B{x,y) + [R*K{x,y)], 



(74) 



Both the background variations and the kernel coefficients 
were searched as a polynomial function of the pixel coordi- 
nates, namely 



B{x,y)= Bhix^y^ 



(75) 
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and 

K{x,v)^Y. E C'.fc.T^WrE*/. (76) 

It is easy to show that finding the optimal Bki and dki 
coef ficients still requires onl y linear least squares minimiza- 
tion. |AlarT^&^Tu2to3 (|l998h also discuss how the individual 
pixels used in the fit must be weighted by the Pois son noise 
level i n order to have a consistent result. Recently. iBramichI 
l|2008l ) searched the optimal kernel K by assuming an al- 
ternate set of kernel base functions A'''^ , involving discrete 
kernels instead of Gaussian functions. These discrete kernels 
are defined as 

j^M^giuv)^ (77) 
where 

(^("")-) ^ / 1 \fu^ X and v = y, . 
I Q otherwise. 

The total number of base k ernels is then A'^kcrncis = (2-Bk + 
1)^. lYuan fc Akerloj (|2008h attempted to find the solution 
Ki, B and Kr of the equation 

I*Ki=B + R-kKr. (79) 

This method is known as cross-convolution and works prop- 
erly in the cases when there is no suitable solution for equa- 
tion (|71|) . For instance, on the image R the profiles have 
such shape parameters where K > and D = while on 
the image I these parameters are K < and D = 0. The 
method of cross-convolution has a disadvantage, namely if 
one finds a solution Ki and Kr for equation (|7ip , Ki-kG and 
Kr ★ G is also a solution (where G is an arbitrary convolu- 
tion kernel). Therefore equation (|71|l is degenerated unless 
additional constraints are introduced (e.g. by minimizing the 
ll-Z^i — Kr\\ difference simultaneously). 



2.8.1 Reference frame 

The noise characteristics of the subtracted image is deter- 
mined by both the reference image R and the target im- 
age I. If both images are individual frames, the generic 
noise level is approximately \/2 times larger than that of 
on the individual frames. In order to reduce the noise level 
on the subtracted frames, the reference image R is cre- 
ated from several individual frames. If the number of such 
frames is A'^, the noise level of the subtracted images is 
^JTTiJN « l + l/(2iV) (supposing that both the refer- 
ence frames and the target image have the same noise level) . 
Thus, a number of A ~ 20 — 25 frames are sufficient to in- 
crease the noise level on the subtracted image only by a few 
percent^^ . 



Strictly speaking, a noisy reference frame implies a corre- 
lated noise on the subtracted frames since the same image (or 
its versions derived by convolution) is subtracted from tiie origi- 
nal frames. Therefore, it is an upper limit for the noise increment 
in the final light curves. However, the scatter in the convolution 
parameters also increase the light curve noise, but this cannot be 
quantified in a simple way. 



2. 8. 2 Registration 

As it was seen related to the difficulties of the photome- 
try on undersampled images (Sec. ETTTTj) . the interpolation 
of such images with sharp profiles is likely to yield arti- 
facts, "spline undershoots" and therefore systematic resid- 
uals (Fig. ID). Since the FWHM of the HATNet frames is 
too small to clearly remove such residuals, we have used the 
following sophisticated registration process. First, using the 
stellar profile parameters and flux estimations yielded by the 
modelling described in Sec. 12.4.21 a model for the images is 
created, involving the program firandom (Sec. l2.12.7|l . This 
image model is then subtracted from the original image, 
yielding a residual with no sharp structures. The residual 
image is then transformed to the reference system, simul- 
taneously with the transformation of the centroid coordi- 
nates found in the stellar profile parameter list. Using the 
transformed stellar profile parameters, another model image 
is created that is added to the transformed residual image. 
Since the stellar profiles can be well modelled by an analytic 
function, this way of image registration yields no artifacts 
on the transformed images, even for highly undersampled 
profiles. Additionally, we do not have to involve all of the 
stars on the image, only the brighter ones, since for fainter 
stars the amplitude of spline undershoots are comparable to 
or less than the noise level. 

This kind of transformation is even more relevant dur- 
ing the creation of the reference image R since this image is 
created by averaging some of the most sharpest images. 

2.8.3 Implementation 

Those methods discussed above that are based on the tech- 
nique of linear least squares are implemented in the program 
ficonv (see Sec. I2.12.12p . The practical details of the pho- 
tometry based on the method of image subtraction are ex- 
plained in Chapter |3l related to the HAT-P-7(b) planetary 
system. 

2.9 Photometry on subtracted images 

As it was discussed in the previous section (Sec. I2.8|l . the 
method of image convolution and subtraction aids the pho- 
tometry process by both decreasing the fiuctuations in the 
background level and reducing the influence of the nearby 
stars on the background area level. A great advantage of the 
image subtraction method is that it does not need to know 
about stars (initially), can use all of the pixels and works in 
extremely crowded images. In the simplest case when both 
the reference image R and the target image / have exactly 
the same intensity level and the stellar profiles are nearly the 
same, the flux of a given star on image / can be obtained 
by simply adding the reference fiux and the fiux measured 
on the residual image. 

However, even in the cases where the stellar profiles are 
nearly the same but the images R and / have different in- 
tensity levels (for example, image / was acquired at higher 
airmass or lower transparency while the reference was cho- 
sen to be one of the high signal-to-noise images, acquired 
at high horizontal altitudes), the photometry on the sub- 
tracted images is not as simple as before. Let us consider 
the following situation. The flux of a given isolated star on 
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the reference image is found to be 1000 ADUs. In the tar- 
get image, this star has an intrinsic flux decrease of 1%, 
thus if this image had been acquired under the same con- 
ditions as the reference image, the flux of the star would 
be 990 ADUs. Let us suppose now that due to the low sky 
transparency, all of the stars have a flux decrease of 50%, 
thus our star is measured to have a flux of 495 ADUs. The 
best fit kernel solution that transforms the reference image 
to the target image is then K = ^5. Therefore, the residual 
flux of the target star would be -5 ADUs. If this residual 
flux is simply added to the reference flux, the obtained flux 
is only 995 ADUs, thus the measured flux decrease (the sig- 
nal itself) is significantly underestimated. Moreover, if the 
kernel solution of equation implies significant difference 
between the FWHMs of the stellar profiles in the reference 
and target image, both the methods of PSF and aperture 
photometry should be tweaked. 

Using the formalism shown in Sec. 12.7.^ aperture pho- 
tometry on subtracted images can be performed as follows. 
It is easy to show that for any weight matrix of Axy, the 
relation 

Y,{R* KU{A* KU =Y.^Rxy^xy)\\K\\l (80) 

x,y x,y 

is true if the aperture A supports the convolved profile of 
R-k K and it is a rather good approximation if the aperture 
has a size that is comparable to the profile FWHM. The 
norm \\K\\p is defined as 

II^IIp := p/E^^- (81) 

Moreover, the ratio of the two sides in equation (|80p is inde- 
pendent from \\K\\\ (even if the aperture A does not support 
completely the convolved profile on R-kK) or in other words, 
this ratio does not change if K is multiplied by an arbitrary 
positive constant. Therefore, involving an aperture of A^y, 
the flux of a source found on the convolved image C = R-kK 
can be obtained as 

Cxy{A k I\_^xy 

^ ""^-jm — ^''^ 

and this raw flux is independent from the large scale flux 
level variations that are quantified by \\K\\. The total flux / 
of the source can be derived from the flux on the reference 
image and the flux of the target image. Since the method of 
image subtraction tries to find the optimal kernel K, that 
minimizes || / — B — _R*i^||2, combining equation (|82p and 
equation (|63|) from Sec. 12.7.21 / is obtained as 

Sxy{A k K)xy 

f = iij^ip + E R^y^^y (83) 

x,y 

Here S is I — B — RkK, the subtracted image. Of course, one 
can derive a background level around the target object on 
the subtracted images, but in most of the cases this back- 
ground level is zero within reasonable uncertainties. How- 
ever, it is worth to include such a background correction 
even on the subtracted images since unpredictable small- 
scale background variations'^^ can occur at any time. 

For instance, variations yielded by thin clouds or scattered 



2.10 Trend filtering 

Photometric time series might show systematic variations 
due to various effects. Of course, if a certain star is indeed a 
variable, the main source of photometric variations should 
be the intrinsic changes in the stellar brightness. However, 
there are various other effects that yield unexpected trends 
in the light curves, which still present after the magnitude 
transformation and even if sophisticated algorithms are in- 
volved in the data reduction (such as image subtraction 
based photometry). The primary reasons for such trends 
are the following. Observational conditions might vary (even 
significantly) throughout the night, for instance clouds are 
blocking the light at some regions of the field, or the back- 
ground level is increasing due to the twilight or the prox- 
imity of the Moon. Additionally, instrumental effects, such 
as variations in the focal length or drops or increases in the 
detector temperature can result in various trends. And fi- 
nally, lack of the proper data reduction is also responsible 
for such effects. For instance, faults in the calibration pro- 
cess, insufficiently large polynomial orders in the astrometric 
or magnitude transformations, underestimated or overesti- 
mated aperture sizes, badly determined PSFs, inappropriate 
reference frames; all of these are plausible reasons for unex- 
pected systematic variations. In this section the efforts are 
summarized intended to reduce the remaining trends in light 
curves. 

The basic concepts of trend removal are the following. 
First, one can assume that instrumental magnitudes have 
some remaining dependence on additional quantities that 
are also derived during the data reduction. Such external 
parameters can be the profile shape parameters, centroid 
coordinates, celestial positions (such as elevation or hour 
angle of the target field or object), or environmental pa- 
rameters (external temperature). The dependence on these 
parameters therefore results in a definite correlation. Assum- 
ing some qualitative dependence, these correlations can then 
be removed, yielding light curves with smaller scatter. The 
type of the qualitative dependence is related to certain pa- 
rameters against which the de-correlation is performed (see 
later on some examples). In general, this meth od of the Ex- 
ternal Parameter Decorrelation (EPD; see e.g. iBakos et al.l 
l2007bl ) yields a linear least squares fit. Second, either if we 
have no information about all of the external parameters or 
there are other sources for the trends that cannot be quanti- 
fied by any specific external parameters (for instance, there 
are thin clouds moving across the subsequent images), one 
can involve the method of Tre nd Filtering Algorithm (TFA; 
iKovacs. Bakos fc Novei I2Q05I ') . This algorithm is based on 
the experience that there are stars with no intrinsic vari- 
ability showing the same features in their light curves. TFA 
removes these trends by using a set of template stars (prefer- 
ably none of them are variables) and searching for coeffi- 
cients that can be used to perform a linear combination be- 
tween the template light curves and then this best fit linear 
combination is subtracted from the original signal. Fig. [18] 
displays these two primary sources of the trends, in the case 
of some non- variable stars^^. In the cases when analysis is 

light, that cannot be characterized by a function like in equa- 
tion 1751 

^•^ These stars are suspected not to be variables above the noise 
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Figure 18. Typical examples of trends. The upper panels display the primary concepts of the External Parameter Decorrelation: for 
a particular star, the lower inset shows the variance in the profile sharpness parameter (S) throughout the night while the upper inset 
shows the instrumental magnitude. The panel in the upper-right corner shows the distribution of the individual measurements in the 
5 — magnitude parameter space. The correlation between these two parameters can be seen clearly. The lower panels display light curves 
for two given stars in the same instrumental photometric system. The insets on the left show the two light curves while the plot in the 
lower-right corner shows the magnitude — magnitude distribution. The correlation between the two magnitudes is quite clear also in this 
case. 



performed on a photometric data set which does have only 
time series information about the magnitudes, the method of 
EPD cannot be applied while TF A still can be very effective 
(for a recent application, see e.g. ISzulagyi. Kovacs fc WelchI 
I2OO9I ). 

Of course, there are several other methods found in 
the literature that are intended to remove or at least, de- 
crease the amplitude of unexpected systematic variations 
in the light curves. The conce pt of the SysRem method 
IjTamuz. Mazeh fc Zuckerll2005l ) can be summarized shortly 
as an algorithm that searches decorrelation coefficients sim- 
ilar to the ones used in the EPD simultaneously to all of 
the light curves then repeats this procedure by assuming 
the external parameters themselves to be unknowns. This 
metho d of SysRem has been improved by I Cameron et al.l 
(|2006l ) in order to have a more robust and reliable generic 
transit search algorithm. The ad-hoc template selection of 
the TFA h as been rep l aced by a hierarchical clustering al- 
gorithm by iKim et al.l (|2008l ). assuming that stars showing 
similar trends are somehow localized. In the following, we 



limits of the measurements. The data displayed here originate 
from the first follow-up transit measurements of the HAT-P-7(b) 
planetary system on 2007 November 2. See Chapter |3]for further 
details about the related data reductions. 



are focusing on the EPD and TFA algorithms, since in the 
HATNet data reductions these algorithms play a key role. 

2.10.1 Basic equations for the EPD and TFA 

Let us assume having a photometric time series for a par- 
ticular star and denote the instrumental magnitudes by rui 
{i = 1, . . . ,N where A'' is the total number of data points). 
The external parameters involved in the decorrelation are 
denoted by p'*"' (fc = 1, . . . , P, where P is the number of 
the independent external parameters) while the magnitudes 
template stars are m''^ -|- m*-*' {t = 1, . . . ,T, where T is the 
total number of template stars and m'*-* is the mean mag- 
nitude for the template star t). The method of EPD then 
minimizes the merit function 




where E^'s are the appropriate EPD coefficients, mo is the 
mean brightness of the star and the weight of the given 
photometric point i is Wi, usually Wi = {ai is the in- 
dividual photometric uncertainty for the measurement i). 
One of the most frequently used p; parameter vector used 
in the EPD of HATNet light curves is pi = {xi — x,yi — 
y, Si, Di, Ki, 1/ cos{zi),Ti}, where Xi and yi are the centroid 
coordinates on the original frames, 5*;, Di and Ki are the 
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stellar profile shape parameters defined in equation (j32}, Zi 
is the zenith distance (thus, l/cos(2i) is the airmass) and n 
is the hour angle. The q refers to the average of the quantity 
q. Although the EPD method yields a linear equation for the 
coefficients Ek, omitting the subtraction of the average cen- 
troid coordinates might significantly offset the value of mo 
from the real mean magnitude. Due to the linearity of the 
problem, this is not relevant unless one wants to rely on the 
value of mo in some sense'^* The function that is minimized 
by TFA is 

2 

At) ' 



XtFA ^^"^ii^ 



- mo 



(85) 



where the appropriate coefficient for the template star t is 
Ft- The similarities between equation (|84|l and equation (|85|) 
are obvious. Indeed, one can perform the two algorithms 
simultaneously, by minimizing the joint function of 

2 



xl+T = E I mi - mo - ^ Ekpf '' - ^ Ftmf j 

i \ k t } 

The de-trended light curve is then 

(EPD) (fc) 



.(86) 



(TFA) 



(EPD+TFA) 



(87) 

^ Ftm!i'> or (88) 
t 

Y^E.pr -Y^F,m^\ (89) 

k t 

for EPD, TFA and the joint trend filtering, respectively. 

2.10.2 Reconstructive and simultaneous trend removals 

Of course, we are not really interested in the de-trending of 
non- variable stars. Unless one wants to quantify the generic 
quality of a certain photometric pipeline, the importance of 
any trend removal algorithm are relevant only in the cases 
where the stars have intrinsic brightness variations. In the 
following, we suppose that the physical variations can be 
quantified by a small set of parameters {^r}, namely the 
fiducial signal of a particular star can be written as 



li = mo -f F{ti, Ai, . . . ,Ay 



(90) 



where F is some sort of model function. 

In principle, one can manage variable stars by four con- 
siderations. First, even stars with physical brightness vari- 
ations are treated as non-variable stars. This naive method 
is likely to distort the signal shape by treating the intrin- 
sic changes in the brightness to be unexpected. In the cases 
where the periodicity of these intrinsic variations are close 
to the periodicity of the generic trends^^ or when the pe- 
riod is comparable or longer with the observation window. 

For instance, light curves from the same source might have 
different average magnitudes in the case of multi-station observa- 
tions. The average magnitudes are then shifted to the same level 
prior to the joint analysis of this photometric data. Either mo or 
the median value of the light curve magnitudes can be used as an 
average value. 

For instance, trends with a period of a day are generally very 
strong. 



either EPD or TFA tend to kill the real signal itself. Sec- 
ond, one can involve the method of signal reconstructio n, as 
it was implemented by iKovacs. Bakos fc No^ (|2005l l. In 
this method, the signal model parameters {Ar} are derived 
using the noisy signal, and then the fit residuals undergo ei- 
ther the EPD or TFA. The model signal F{ti, . . .) is added 
to the de-trended residuals, yielding a complete signal re- 
construction. The steps can be repeated until convergence 
is reached. Third, one can involve the simultaneous deriva- 
tion of the Ar model parameters and the Ek/Ft coefficients 
by minimizing the merit function 



X 



- mo 



F{t,,{Ar})-Y,Ekpi 



k) 



(91) 



(This merit function shows the simultaneous trend removal 
for EPD. The TFA and the joint EPD-I-TFA can be apphed 
similarly.) The fourth method derives the Ek and/or Tf co- 
efficients on sections of the light curve where the star itself 
shows no real variations. This is a definitely useful method 
in the analysis of planetary transit light curves, since the 
star itself can be assumed to have constant brightness within 
noise limitations^^ and therefore the light curve should show 
no variations before and after the transit. If these out-of- 
transit sections of the light curves are sufficiently long, the 
trend removal coefficients Ek and/or Tj can safely be ob- 
tained. 

There are some considerations regarding to the 
F{ti, Ai, . . . , A^,) function and its parameters {Ar} that 
should be mentioned here. In principle, one can use a model 
function that is related to the physics of the variations. For 
instance, a light curve of a transiting extrasolar planet host 
star can be well modelled by 5 parameters'^: period (P), 
epoch (E), depth of the transit (d), duration of the tran- 
sit (ti4) and the duration of the ingresses/egresses (T12) 
(see e.g. ICarter et al.l |2008| . about how these parameters 
are related to the physical parameters of the system, such 
as normalized semimajor axis, planetary radius and or- 
bital inclination). Although the respective model function, 

-^transit 

{ti, P, E, d, Ti4, ri2) is highly non-linear in its param- 
eters, the simultaneous signal fit and trend removal of equa- 
tion (|9H) can be performed, and the fit yields reliable results 
in general'® . In the cases where we do not have any a priori 
knowledge of the source of the variations, but the signal can 
be assumed to be periodic, one can use a periodic model for 
F, that is, for instance, a linear combination of step func- 
tions. Although the number of free parameters (which must 
be involved in such a fit) are significantly larger, in the cases 
of HATNet light curves, the fit can be achieved properly. The 
signal reconstruction algorithm of iKovacs. Bakos fc Noved 

use a step function (also known as "folded and binned 

At least, in the most of the cases. A famous counter-example 
is the star CoRoT-Exo-2 of lAlonso et al.l 1 I2OO8I) . 

Other parameters might be present if we do not have a priori 
assumptions for the limb darkening and/or the planetary orbit 
is non-circular and the signal-to-noise of the light curve is suffi- 
ciently large to see the asymmetry. 

'* Only if the transit instances inter/extrapolated from the initial 
guess for the epoch E and period P sufficiently cover the observed 
transits. Otherwise, all of the parametric derivatives of F will be 
zero and only methods based on systematic grid search (e.g. BLS) 
yield reliable results. 
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light curve models") for this purposes. Like so, F can also be 
written as a Fourier series with finite terms. If the period and 
epoch are kept fixed, both assumptions for the function F 
(i.e. step function or Fourier expansion) yield a linear fit for 
both the model parameters and the EPD/TFA coefficients. 

It should be mentioned here that the signal re- 
construction mode and the simultaneous trend re- 
moval yields roughly the same results. However, a 

J irominent counter-example is the case of HAT-P-ll(b) 
Bakos. Torres. Pal et "alTl2009f l. where the reconstruction 
mode yielded an unexpectedly high impact parameter for 
the system. In this case, only the method of simultaneous 
EPD and TFA was able to reveal a refined set of light curve 
parameters that are expected to be more accurate on an ab- 
so lute scale. Further discuss i on of this problem can be found 
in lBakos. Torres. Pal (|2009l '). 

2.10.3 Efficiency of these methods 

It is important to emphasize that both the EPD and TFA 
algorithms (independently from their native, reconstructive 
or simultaneous applications) reduce the effective degrees 
of freedom and therefore the light curve scatter always de- 
creases. In order to determine whether the application of 
any of these algorithms is effective, one should compute the 
unbiased residuals of the fit after the derivation of the de- 
correlation coefficients. Alternatively, one can increase the 
scatter of a particular light curve by the factor ^/N/{N - P) 
where N is the number of total data points in the light curve 
and P is the number of parameters involved in the EPD or 
TFA. We should keep in mind that both during the selec- 
tion of the appropriate external parameters and during the 
template selection, the unbiased residuals must be checked 
carefully, otherwise the efficiency of these algorithms can 
easily be overrated. 

2.11 Major concepts of the software package 

Continuous monitoring of the sky yields enormous amount 
of data. In the HATNet project, 6 telescopes expose images 
with a cadence of 5.5 minutes. Each image is a 2k x 2k 
(up to August 2007) or 4k X 4k array of pixels, thus the 
amount of data gathered on each clean night is ~ 80 — 120 
scientific frames for a single telescope, equivalent to 7 — 11 
or 30 — 45 gigabytes of uncompressed calibrated images (as- 
suming frames with the size of2kx2kor4kx4k pixels, re- 
spectively). In other words, if a single fie ld is monitored for 2 
months by two of the telescopes (see e.g. iBakos et al. I l2007bl . 
for a description of the actual observational principles), 
yielding ~ 5000 individual scientific frames. The amount of 
data associated to this certain field is ~ 300 — 350 gigabytes 
in a form of calibrated images (assuming 4k x 4k images). If 
photometry is performed on these frames, the amount of as- 
sociated information for 10 000 stars and for a single frame 
is ~ 3 megabytes of data, therefore one needs hundreds of gi- 
gabytes storage space just for the photometric results. All in 
all, the total amount of data that can be associated to the 
reduction of a single monitored field can be even be close 
to one terabyte, including all of the results of previously 
mentioned data types as well as other ones, for instance as- 
trometrical information, subtracted images, or light curves 
with some sort of de-trending. 



The components of the software package must be appro- 
priate to manage such a huge amount of data. Thus, before 
going into the details of the practical implementation, two 
issues should be clarified. First, what kinds of data struc- 
tures do appear during the reduction of the images? This 
is a rather important question since the programs not only 
have to access and manipulate these data but the resource 
limitations of the computers do also constrain the available 
solutions. Second, what are the existing software solutions 
which can efficiently be exploited? We are especially focus- 
ing on such operating systems and the related tools that 
are supported by larger communities and have a free and 
portable implementation. 



2.11.1 Data structures 

At a first glance, data associated with image reduction can 
be classified into two major groups. The first group, that 
requires the most of the storage space is in the form of 
massive linear data, such as sequences of records, arrays of 
basic types or other multidimensional arrays. Astronomical 
images, processed images (such as registered or subtracted 
ones), instrumental photometric information, light curves, 
de-trended light curves, Fourier or other kind of spectra of 
the light curves belong to this group. All of these data are 
a set of records with the same structure. For instance, 

• an image is a two dimensional array of integer or real 
numbers; 

• the list of extracted sources, where each record contains 
information on the source's coordinates, brightness, shape 
parameters and possible catalogue identifiers; 

• a light curve is a series of individual photometric mea- 
surements, where each measurement has a time, some sort 
of quality fiag, magnitudes for various apertures and/or var- 
ious photometric methods, uncertainty estimations; or 

• instrumental photometry, where the records contain the 
same kind of information as the records of light curves, but 
one set of records is associated not to a particular object 
covering a long timebase, but to a single frame and numerous 
individual objects; 

• additional catalogue information for each star, that can 
be useful in the interpretation of the photometric time series: 
such as brightness, color, spectral type, evolutionary state, 
parallax (if known), variability (if known). 

These data types in the following are referred to as simply 
"data" in a general context. 

The second major group of data types is the "meta- 
data" , that do not have linear structure like the data types 
discussed above, and represent definitely smaller amount of 
information. For instance, 

• observational conditions for each image, such as date 
and time of the observation, location, instrument descrip- 
tion, primary target object or field; 

• astrometric solution, where the information itself is the 
transformation that maps a reference catalogue to the frame 
of the image; 

• point-spread function for a single image; 

• kernel solution, that describes the convolution function 
used in the process of image subtraction. 
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Table 3. Comparison of various data storage schemes. In this hst, "blobs" are used as an acronym for "binary large objects" (a collection 
of purely binary data in a single file). . 



Pros 
FITS^ 

• Flexible Image Transport System. The most common 
format and standard for astronomical data storage, espe- 
cially for images (either raw or calibrated) and spectra. 

• Extensible format, supports not only multidimen- 
sional numeric arrays but in addition, structured flat tables 
and ASCII tables can also be stored inside a single FITS 
file. 

• Metadata storage is also available in a form of key- 
words and their associated values. For instance, the loca- 
tion of observation, information about the observer, date 
and time of the observation, instrumental details (such as 
filters, exposure time, optical data for the telescope) are 
stored almost always in FITS files involving consensual 
keywords. 



Cons 



• Although some parts of the FITS files are stored in 
ASCII form (such as keywords and their values and tex- 
tual tables), extracting data from FITS files requires spe- 
cial tools. Moreover, to access to just a smaller segment of 
the FITS data, one likely has to parse the whole file. For 
instance, if in a single file there are 100 stored tables and 
one needs data only from the last table, all of the other 
tables (at least their headers) have to be read and parsed, 
since there is no pre-defined location of the last table. 

• Inserting or removing some keywords to/from a FITS 
header likely results in an update of the whole file. 



Binary (large) objects 

• Binary large object files (also known as "blobs") pro- 
vide the fastest way for both accessing (reading) and writ- 
ing data. 

• Various indexing algorithms are available to make the 
data access more efficient. Such algorithms can be opti- 
mized for any kind of data structure and access mode, in- 
cluding sequential access or two- or multidimensional hier- 
archy of data records. 



• Such blobs are not human-readable, special programs 
are required for accessing, reading or modifying the data. 
Basic tools found in UNIX-like systems are not capable for 
generic manipulation of binary data. 

• Binary representation of integers and floating point 
numbers depends on the actually used computer/processor 
architecture. Unless special attention is given, such blobs 
cannot be copied from one computer to another if they are 
using different architectures. Involving an architecture in- 
dependent storage format reduces some advantages of blobs 
(such as fast access). 



Linear ASCII/text files 

• Human-readable format, easy to interpret. 

• Basic tools found in UNIX-like systems are capable to 
view or manipulate plain textual data. 

• All of the programming languages, including data pro- 
cessing environments and plotting tools support to read 
and parse numeric data from textual formats. 

• Modifications are easy to implement. Any kind of text 
editors or word processors are appropriate for manual ma- 
nipulating of the data. 



• Access to massive numeric data in stored in textual 
format can significantly be slower than access to the same 
amount of data that are stored in blobs. 

• Random or even non-sequential access of small chunks 
of data stored in a single text file requires the reading and 
parsing of the whole file. 

• The same type of data require 5—8 times larger storage 
space than if these data were stored in blobs (depending on 
the actual data types and/or our needs for a well structured 
file). 



Third-party applications: database servers, exter- 
nal storage systems 

• Easy to maintain. Database solutions support various 
methods for management, access control and such servers 
come with programming interfaces for many languages and 
environments. 

• The underlying database engines involve large number 
of algorithms for optimal data storage and allow efficient 
queries (using various indexing methods) . The engines can 
be fine-tuned in order to optimize for our particular prob- 
lem. 



• The indexing and therefore the access to the data is 
optimized for one dimensional arrays of records (i.e. "flat" 
tables). Therefore storing images or other two- or multi- 
dimensional data structures (such as astronomical cata- 
logs, long photometric time series of enormous amount of 
objects) cannot be implemented efficiently using classical 
database engines. 



^ The detailed documentation about tlie FITS file format is available from http://fits 



Table |4] summarizes the above mentioned various data types 
and their expected storage space requirements appearing in 
the photometric analysis. 

Of course, both linear data and metadata that are cre- 
ated during the image reduction process should be stored in 



some format. There are various concepts for data formats 
available in modern computers and operating systems, so 
one can choose the most suitable format for each purpose. In 
astronomy, people commonly store and share data in FITS 
format. Many programs use human-readable (ASCII or text) 
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Table 4. An overview of data files used to store information required by the image reduction process or created during the reduction. 
Each file type is referred by its extension. 



File size (-^) 
it of filesfl \ 




^ -^framc 


^ -^object 




*.config: generic information 
about the whole reduction and 
observational conditions (name 
and coordinates of the target 
field, involved reduction algo- 
rithms and their fine-tune param- 
eters) 


*.list: list of frames to be pro- 
cessed during the reduction 
*.stat: basic statistics for each 
frame (both image statistics such 
as number of detected objects 
and information on the observa- 
tional circumstances, e.g. zenith 
distance, airmass, elevation of the 
Moon, stellar profile FWHM) 


*.cat: list of objects and some 
catalogue information that is 
used during the reduction 
*.lcstat: light curve statistics 
(also known as "magnitude-rms" 
statistics) 


-^frame 


♦ .trans; astrometric solution 
(the transformation that maps 
the reference catalogue to the 
coordinate system of the image) 

♦ .kernel: kernel solution (the 
convolution function used in the 
image subtraction process) 
*-psf .fits: best fit point-spread 
function for a given image 


X 


*.fits: calibrated images^ 
*-sub.fits: convolved and sub- 
tracted images-"^ 

*. stars: list of detected sources 
and their properties (coordinates, 
shape parameters, brightness es- 
timation) 

*.phot: instrumental photomet- 
ric measurements 


CiC N -u- 4- 
^ "object 


* . xnunc : best fit and Kdontc-Carlo 
distribution of the parameters of 
the light curve model function (if 
the object is turned out to be in- 
teresting) 

*.info: summary information of 
the planetary, orbital and stel- 
lar data for the actual object (if 
the object is indeed a planet- 
harboring star) 


*.lc: light curves 
*.epdlc: dc-trendcd light curves 
involving only the External Pa- 
rameter Decorrelation algorithm 
*.tfalc: de-trended light curves 
involving the Trend Filter Algo- 
rithm 


X 



^ Strictly speaking, the size of tliese files does not depend on the number of objeets that are extracted from the image and/or targets for further 
photometry. However, larger images tend to have greater number of sources of interest. 



files both for input and output. Some other programs store 
their information in binary format, where the contents of 
the files cannot even be viewed without a special program. 
And there are robust database systems, that hide the de- 
tails of the actual storage and give a relatively lightweight 
interface to access or manipulate the data. Each type of 
the above mentioned data representations has its own ad- 
vantages and disadvantages. In Table [3] these properties are 
summarized for these four major representation schemes. 
During the reduction of HATNet data, we have chosen a 
mixed form of data representation as follows. The images, in- 
cluding the raw, calibrated and processed ones are stored in 
FITS format. Moreover, we use three dimensional FITS im- 
ages to store the spatial variations of the point-spread func- 
tion. Other metadata, such as astrometrical solutions, kernel 
solutions, catalogue information are stored in text files. In- 
strumental photometric measurements and hght curves are 
also stored in the form of text files. Temporary data (needed 
for intermediate steps of the reduction) are stored in binary 
form, since such data are not needed to be portable and an 
advantage of the binary format is the significantly smaller 
storage space requirement. 



2.11.2 Operating environment 

In order to both have a portable and robust set of 
tools, one has to build a software package on the top 



of widely standardized and documented environment. The 
most widespread and approved standard is the "Portable 
Operating System Interface" or POSIX^®, that intended to 
standardize almost all layers of the operating system, from 
the system-level application program interfaces (APIs, such 
as file manipulation or network access) up to the highest 
level of programs such as shell environments, related script- 
ing languages and other basic utilities. 

The actual development of the package fi/fihat was 
done under GNU^'^/Linux''^ systems, that is one of the most 
frequently used POSIX compliant, UNIX-like"^^ free operat- 
ing system. The main code was written in ANSI C (fea- 
tured with some GNU extensions) and intended to be com- 
piled without any difficulties on various other UNIX sys- 
tems such as SunOS/Sparc and Mac OSX. The compila- 
tion of the package does not require additional packages or 
libraries, only the GNU C Compiler (gcc^'^), its standard 
library (glibc^"*), the associated standard header files and 
some related development utilities. (Such as make"^^ or the 



http: / / cn.wikipedia.org/ wiki/POSIX 

http:/ /www. gnu. org/ 

http: / /www. kernel. org/ 

http: / /en. wikipedia.org/wiki/Unix-like 

http:/ /gcc.gnu.org/ 

http: / /www. gnu.org/softwarc/libc/ 

http: / /www. gnu.org/software/make/ 
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sir object archived. In almost all of the systems these come 
with gcc as its dependencies) Therefore, all of the require- 
ments of the package include only free and open source soft- 
ware (F/OSS). 

In practice, to have a complete data reduction environ- 
ment the users of the package might have to use additional 
text processing utilities such as an implementation of the 
AWK programming language (for instance, gawk^^, that is 
included in all of the free GNU/Linux systems) and basic 
text processing utilities (such as paste, cat, sort, split, in- 
cluded in the textutils/coreutils^*^ GNU package). And 
finall y, for visualization pu rposes, the SAOImage/DS9 util- 
ity^^ (|jove fc Mandelll200^ is highly recommended. 

2.12 Implementation 

In this subsection I summarize the standalone programs that 
are implemented as distinct binary executables. The pro- 
grams can be divided into two well separated groups with 
respect to the main purposes. In the first group there are the 
programs that manipulate the (astronomical) images them- 
selves, i.e. read an image, generate one or do a specific trans- 
formation on an image. In the second group, there are the 
programs that manipulate textual data, mostly numerical 
data presented in a tabulated form. 

Generally, all of these programs are capable to the fol- 
lowing. 

• The codes give release and version information as well 
as the invocation can be logged on demand. The version 
information can be reported by a single call of the binary, 
moreover it is logged along with the invocation arguments 
in the form of special FITS keywords (if the main output 
of the actual code is a processed FITS image) and in the 
form of textual comments (if the main output of the code 
is text data). Preserving the version information along with 
the invocation arguments makes any kind of output easily 
reproducible. 

• All of the codes are capable to read their data to be pro- 
cessed from the standard input and write the output data to 
the standard output. Since many of these programs manip- 
ulate relatively large amount of data, the number of unnec- 
essary hard disk operations should be reduced as small as 
possible. Moreover, in many cases the output of one of the 
programs is the input of the another one. Pipes, available in 
all of the modern UNIX-like operating systems, are basically 
designed to perform such bindings between the output and 
input of two programs. Therefore, such a capability of redi- 
recting the input/output data streams significantly reduce 
the overhead of background storage operations. 

• The programs that deal with symbolic operations and 
functions, a general back-end library^" is provided to make 
a user-friendly interface to specify arithmetic expressions. 
This kind of approach in software systems is barely used, 
since such a symbolic specification of arithmetic expressions 
does not provide a standalone language. However, it allows 

http:/ /www. gnu.org/softwarc/binutils/ 
^"^ http:/ /www. gnu.org/softwarc/gawk/ 

http:/ /www. gnu.org/softwarc/corcutils/ 
^'^ http:/ /hca-www. harvard.edu/RD/ds9/ 

available from http://libpsn.sf.net 



an easy and transparent way for arbitrary operations, and 
turned out to be very efficient in higher level data reduction 
scripts. 

• The programs that manipulate FITS images are capable 
to handle files with multiple extensions. The FITS standard 
allows the user to store multiple individual images, as well as 
(ASCII or binary) tabulated data in a single file. The control 
software of some detectors produces images that are stored 
in this extended format, for example, such detectors where 
the charges from the CCD chip are read out in multiple di- 
rections (therefore the camera electronics utilizes more than 
one amplifier and A/D converter, thus yield different bias 
and noise levels). Other kind of detectors (which acquire 
individual images with a very short exposure time) might 
store the data in the three dimensional format called "data 
cube" . The developed codes are also capable to handle such 
data, therefore it is possible to do reductions on images ob- 
tained by the Spitzer Space Telescope, that optionally uses 
such data structures for image storage. 

The list of standalone binaries and their main purposes that 
come with the package are shown in Table [5] 

2.12.1 Basic operations on FITS headers and keywords - 
fiheader 

The main purpose of the fiheader utility is to read specific 
values from the headers of FITS files and/or alter them on 
demand. 

Although most of the information about the observa- 
tional conditions is stored in the form of FITS keywords, 
image manipulation programs use only the necessary ones 
and most of the image processing parameters are passed as 
command line arguments (such keywords and data are, for 
example, the gain, the image centroid coordinates, astromet- 
rical solutions) . The main reasons why this kind of approach 
was chosen are the following. 

• First, interpreting many of the standard keywords leads 
to false information about the image in the cases of wide- 
field or heavily distorted images. Such a parameter is the 
gain that can be highly inhomogeneous for images acquired 
by an optical system with non-negligible vignetting and the 
gain itself cannot be described by a single real number*^, 
rather a polynomial or some equivalent function. Similarly, 
the standard World Coordinate System information, de- 
scribing the astrometrical solution of the image, has been 
designed for small field-of-view images, i.e. the number of 
coefficients are insufficiently few to properly constrain the 
astrometry of a distorted image. 

• Second, altering the meanings of standard keywords 
leads to incompatibilities with existing software. For exam- 
ple, if the format of the keyword GAIN was changed to be 
a string of finite real numbers (describing a spatially var- 
ied gain), other programs would not be able to parse this 
redefined keyword. 

Therefore, our conclusion was not altering the syntax of the 
existing keywords, but to define some new (wherever it was 
necessary). The fiheader utility enables the user to read any 

For which the de facto standard is the GAIN keyword. 
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Table 5. An overview of the standalone binary programs included in the package, displaying their main purposes and the types of input 
and output data. 



Program 


Main purpose 


Type of input 


Type of output 


f iarith 


Evaluates arithmetic expressions on images 


A set of FITS images. 


A single FITS image. 




as operands. 






f icalib 


Performs various calibration steps on the in- 


A set of raw FITS images. 


A set of calibrated FITS image. 




put images. 






fi combine 


Combines {most frequently averages) a set of 


A set of FITS images. 


A single FITS image. 




images. 






f iconv 


Obtains an optimal convolution transforma- 


Two FITS images or a single image 


A convolution transformation or a sin- 




tion between two images or use an existing 


and a transformation. 


gle image. 




convolution transformation to convolve an 






f iheader 


image. 

Manipulates, i.e. reads, sets, alters or re- 


A single FITS image (alternation) or 


A FITS image with altered header or 




moves some FITS header keywords and / or 


more FITS images (if header contents 


a series of key words/ values from the 




their values. 


are just read) . 


headers. 


fiign 


Performs low-level manipulations on masks 


A single FITS image (with some op- 


A single FITS image (with an altered 




associated to FITS images. 


tional mask) . 


mask) . 


f iinf o 


Gives some information about the FITS im- 


A single FITS image. 


Basic information or PNM images. 




age in a human- readable form or creates im- 








age stamps in a conventional format. 






f iphot 


Performs photometry on normal, convolved 


A single FITS image (with additional 


Instrumental photometric data. 




or subtracted images. 


reference photometric information if 








the image is a subtracted one). 




fi random 


Generates artificial object lists and/or arti- 


List of sources to be drawn to the im- 


List of sources and/or a single FITS 




ficial (astronomical) images. 


age or an arithmetic expression that 


image. 






describes how the list of sources is to 








be created. 




f istar 


Detects and characterizes point-like sources 


A single FITS image. 


List of detected sources and an op- 




from astronomical images. 




tional PSF imago (in FITS format). 


f itrans 


Performs generic geometric (spatial) trans- 


A single FITS image. 


A single, transformed FITS image. 




formations on the input image. 






f i [un] zip 


Compresses and decompresses primary FITS 


A single uncompressed or compressed 


A single compressed or uncompressed 




images. 


FITS image file. 


FITS image file. 


grcollect 


Performs data transposition on the input 


A set of files containing tabulated 


A set of files containing the transposed 




tabulated data or do some sort of statistics 


data. 


tabulated data or a single file for the 




on the input data. 




statistics, also in a tabulated form. 


grmat ch 


Matches lines read from two input files of 


Two files containing tabulated data 


One file containing the matched lines 




tabulated data, using various criteria (point 


(that must be two point sets in the 


and in the case of point matching, an 




matching, coordinate matching or identifier 


case of point or coordinate matching). 


additional file that describes the best 




matching) . 




fit geometric transformation between 








the two point sets. 


grselect 


Selects lines from tabulated data using vari- 


A single file containing tabulated data. 


The filtered rows from the input data. 




ous criteria. 






grtrans 


Transforms a single coordinate list or derives 


A single file containing a coordinate 


A file with the transformed coordinate 




a best-fit transformation between two coor- 


list and a file that describes the trans- 


list in tabulated from or a file that con- 




dinate lists. 


formation or two files, each one is con- 


tains the best-fit transformation. 






taining a coordinate list. 




Ifit 


General purpose arithmetic evaluation, re- 


Files containing data to be analyzed in 


Regression parameters or results of the 




gression and data analysis tool. 


a tabulated form. 


arithmetic evaluation. 



of the keywords, and allows higher level scripts to interpret 
the values read from the headers and pass their values to 
other programs in the form of command line arguments. 



2.12.2 Basic arithmetic operations on images - f iarith 

The program f iarith allows the user to perform simple op- 
erations on one or more astronomical images. Supposing all 
of the input images have the same size, the program allows 
the user to do per pixel arithmetic operations as well as 
manipulations depend on the pixel coordinates themselves. 

The invocation syntax simply reflects the desired oper- 
ations. For example the common way of calibrating image 
/, using bias {B), dark [D) and flat {F) images, which can 
be written as 



C = 



B-D 



F/\\F\\ 



(92) 



where C denotes the calibrated image (see also equation [T|. 
Thus, the computation of the calibrated image C can be 
written as 



fiarith " ( ' I ' - 'B' - 'D' ) /( 'F' /norm( 'F' ) ) " -o C 



2.12.3 Basic information about images - fiinfo 

The aim of the program fiinfo is twofold. First, this pro- 
gram is capable to gather some statistics and masking infor- 
mation of the image. These include 

• general statistics, such as mean, median, minimum, 
maximum, standard deviation of the pixel values; 

• statistics derived after rejecting the outlier pixels; 

• estimations for the background level and its spatial vari- 
ations; 

• estimations for the background noise; and 

• the number of masked pixels, detailing for all occurring 
mask types. 

The most common usage of fiinfo in this statistical mode 
is to deselect those calibration frames that seem to be faulty 
(e.g. saturated sky flats, aborted images or so). 

Second, the program is capable to convert astronom- 
ical images into widely used graphics flle formats. Almost 
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all of the sc aling options available in the well known DS9 
program (see IJove fc Mandeill2003l ) have been implemented 
in fiinfo, moreover, the user can define arbitrary color 
palettes as well. In practice, fiinfo creates only images in 
PNM (portable anymap) format. Images stored in this for- 
mat can then be converted to any of the widely used graphics 
file formats (such as JPEG, PNG), using existing software 
(e.g. netpbm, convert /ImageMagick) . Figures in this thesis 
displaying stamps from real (or mock) astronomical images 
have also been created using this mode of the program. 



2.12.4 Combination of images - ficombine 

The main purpose of image combination is to create a single 
image with good signal-to-noise ratio from individual images 
with lower signal-to-noise ratio. The program ficombine is 
intended to perform averaging of individual images. In prac- 
tice, the usage of this program is twofold. First, it is used to 
create the master calibration frames, as it is defined by equa- 
tion ([3]), equation ([5]) and equation ([7|). Second, the reference 
frame required by the method of image subtraction is also 
created by averaging individual registered object frames (see 
also Sec. l2.12.11l about the details of image registration). 

In the actual implementation, such combination is em- 
ployed as a per pixel averaging, where the method of aver- 
aging and its fine tune parameters can be specified via com- 
mand line arguments. The most frequently used "average 
values" are the mean and median values. In many applica- 
tions, rejection of outlier values are required, for instance, 
omitting pixels affected by cosmic ray events. The respective 
parameters for tuning the outlier rejection are also given as 
command line options. See Sec. l2.1231 for an example about 
the usage of ficombine, demonstrating its usage in the cal- 
ibration pipeline. 



2.12.5 Calibration of images - ficalib 

In principle, the program ficalib implements the evalua- 
tion of equation ([T} in an efficient way. It is optimized for 
the assumption that all of the master calibration frames are 
the same for all of the input images. Because of this assump- 
tion, the calibration process is much more faster than if it 
was done independently on each image, using the program 
f iarith. 

Moreover, the program ficalib automatically performs 
the overscan correction (if the user specifies overscan re- 
gions), and also trims the image to its designated size (by 
clipping these overscan areas). The output images inherit 
the masks from the master calibration images, as well as 
additional pixels might be masked from the input images if 
these were found to be saturated and/or bloomed. When a 
single chip camera uses multiple readout gates, amplifiers 
and A/D converters the images are stored in a so-called mo- 
saic format (such as KeplerCam). The program ficalib is 
capable to combine these mosaic image regions into one sin- 
gle image. 

In Fig. [19] a shell script is shown that demonstrates the 
usage of the programs ficalib and ficombine on a real-life 



application, namely how the images acquired by the FLWO 
KeplerCam'*^ are completely calibrated. 



2.12.6 Rejection and masking of nasty pixels - fiign 

The aim of the program fiign is twofold. First, it is in- 
tended to perform low-level operations on masks associated 
to FITS images, such as removing some of the masks, con- 
verting between layers of the masks and merging or combin- 
ing masks from separate files. Second, various methods exist 
with which the user can add additional masks based on the 
image itself. These additional masks can be used to mark 
saturated or blooming pixels, pixels with unexpectedly low 
and/or high values or extremely sharp structures, especially 
pixels that are resulted by cosmic ray events. 

This program is a crucial piece in the calibration 
pipeline if it is implemented using purely the f iarith pro- 
gram. However, most of the functionality of fiign is also 
integrated in ficalib (see Sec. 12. 123)1 . Since ficalib much 
more efficiently implements the operations of the calibra- 
tion than if these were implemented by individual calls of 
f iarith, fiign is used only occasionally in practice. 



2.12.1 Generation of artificial images ~ firandom 

The main purpose of the program firandom is to create ar- 
tificial images. These artificial images can be used either 
to create model images for real observations (for instance, 
to remove fitted stellar PSFS) or mock images that are in- 
tended to simulate some of the infiuence related to one or 
more observational artifacts and realistic effects. In prin- 
ciple, firandom creates an image with a given background 
level on which sources are drawn. Additionally, firandom 
is capable to add noise to the images, simulating both the 
effect of readout and background noise as well as photon 
noise. In the case of mock images, firandom is also capable 
to generate the object list itself. The stellar profile mod- 
els that are supported by firandom and therefore available 
for artificial images are the same set of functions described 
in Sec l2.4.2l Moreover, firandom is capable to draw stellar 
profiles derived from PSFs (by the program f istar, see also 

Sec. [23211. 

The program features symbolic input processing, i.e. the 
variations in the background level, the spatial distribution of 
the object centroids (in the case of mock images), the profile 
shape parameters, fiuxes for individual objects and the noise 
level can be specified not only as a tabulated dataset but in 
the form of arithmetic expressions. In these expressions one 
can involve various built-in arithmetic operators and func- 
tions, including random number generators. Of course, the 
generated mock coordinate lists can also be saved in tabu- 
lated form. The mock images used during the generation of 
Fig. [T] Fig. [2] or Fig. [S]have been created by firandom. 

In Fig. 1201 some examples are shown that demonstrate 
the usage of the program firandom. 



See: |http:/ /www. sao.arizona.edu/FLWO/48/kep. primer. html| 
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#!/bin/sh 



# Names of the individual files storing the raw bias, flat and object frames are stored here: 
BIASLIST=($SOURCE/ [0-9] * .BIAS . fits) 
FLATLIST=($SOURCE/ [0-9] * .FLAT. fits) 
OB JLIST= ($SQURCE/ [0-9] * . TARGET .fits) 



# Calibrated images: all the images are got an 'R' prefix and put in the appropriate directory : 
R_BIASLIST=($(for f in ${BIASLIST [*] } ; do echo $MSTTMP/bias/R'basename $f ' ; done)) 
R_FLATLIST=($(for f in ${FLATLIST [*] } ; do echo $MSTTMP/f lat/R'basename $f ' ; done)) 
R_OBJLIST=( $(for f in ${OBJLIST [*] } ; do echo $TARGET/R'basename $f ' ; done)) 

# These below are KeplerCam specific data, defining the topology and geometry of the CCD itself. 

# The camera has four readout registers and therefore four amplifiers and A/D converters as well. 
MSJIAME=(IM1 IM2 IMS IM4) 

MS_OPAR=spline , order=3 , iterations=2 , sigina=3 

MS_OVER= (area={2 : : 7 : 1023 , 1034 : : 1039 : 1023 , 2 : : 7 : 1023 , 1034 : : 1039 : 1023} , ${MS_DPAR}) 
MS_OFFS=( 1024, 1024 0,1024 1024,0 0,0) 
MS_TRIM=image= [8 : : 1031 : 1023] 



M_ARGS=" — mosaic size= [2048, 2048] " 

M_ARGS="$M_ARGS —mosaic [name=${MS_NAME [0] } ,$MS_TRIM 
M_ARGS="$M_ARGS —mosaic [name=${MS_NAME [1] } ,$MS_TRIM 
M_ARGS="$M_ARGS —mosaic [name=${MS_NAME [2] } ,$MS_TRIM 
M_ARGS="$M_ARGS —mosaic [name=${MS_NAME [3] } ,$MS_TRIM 



, overscan= [${MS_OVER [0] }] , of f set= [${MS_OFFS [0] }] ] 
, overscan= [${MS_OVER [1] }] , of f set= [${MS_OFFS [1] }] ] 
, overscan= [${MS_OVER [2] }] , of f set= [${MS_OFFS [2] }] ] 
, overscan= [${MS_OVER [3] }] , of f set= [${MS_OFFS [3] }] ] 



# The calibration of the individual bias frames, followed by their combination into a single master image: 
ficalib -i ${BIASLIST [*] } —saturation 50000 $M_ARGS -o ${R_BIASLIST [*] } 

ficombine ${R_BIASLIST [*] } — mode median -o $MASTER/BIAS . f its 

# The calibration of the individual flat frames, followed by their combination into a single master image: 
ficalib -i ${FLATLIST[*]} —saturation 50000 $M_ARGS -o ${R_FLATLIST [*] } \ 

— input -master-bias $MASTER/BIAS . f its — post-scale 20000 
ficombine ${R_FLATLIST [*] } — mode median -o $MASTER/FLAT.f its 



# The calibration of the object images: 

ficalib -i ${OBJLIST [*] } —saturation 50000 $M_ARGS -o ${R_OBJLIST [*] } \ 

— input -master-bias $MASTER/BIAS . f its — input -master-flat $MASTER/FLAT.f its 



Figure 19. A shell script demonstrating the proper usage of the ficalib and ficombine programs on the example of the calibration of 
the KeplerCam mosaic images. The names for the files containing the input raw frames (both calibration frames and object frames) are 
stored in the arrays $BIASLIST [*] , $FLATLIST [*] and $0B JLIST [*] . The variable $M_ARGS contains all necessary information related to 
the specification of the mosaic topology and geometry as well as the overscan areas associated to each readout direction. The individual 
calibrated bias and flat frames arc stored in the subdirectories of the $MSTTMP directory. These files are then combined to a single master 
bias and flat frame, that are used in the final step of the calibration, when the object frames themselves are calibrated. The final 
calibrated scientific images are stored in the directory $TARGET. Note that each flat frame is scaled after calibration to have a mean value 
of 20,000 ADU. In the case of dome flats, this scaling is not necessary, but in the case of sky flats, this steps corrects for the variations 
in the sky background level (during dusk or dawn). 



2.12.8 Detection of stars or point-like sources - fistar 

The star detection and stellar profile modelling algorithms 
described in Sec. 12.41 are implemented in the program 
fistar. The main purpose of this program is therefore to 
search for and characterize point-like sources. Additionally, 
the program is capable to derive the point-spread function 
of the image, and spatial variations of the PSF can also be 
fitted up to arbitrary polynomial order. 

The list of detected sources, their centroid coordinates, 
shape parameters (including FWHM) and flux estimations 
are written to a previously deflned output file. This file 
can have arbitrary format, depending on our needs. The 
best fit PSF is saved in FITS format. If the PSF is sup- 
posed to be constant throughout the image, the FITS im- 



age is a normal two-dimensional image. Otherwise, the PSF 
data and the associated polynomial coefficients are stored 
in "data cube" format, and the size of the z (NAXIS3) axis 
is {NpsF + l)(AfpsF + 2)/2, where A^'psp is the polynomial 
order used for fitting the spatial variations. 



2.12.9 Basic coordinate list manipulations - grtrans 

The main purpose of the program grtrans is to perform co- 
ordinate list transformations, mostly related to stellar pro- 
file centroid coordinates and astrometrical transformations. 
Since this program is used exhaustively with the program 
grmatch, examples and further discussion of this program 
can be found in the next section, Sec. 12.12.101 
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#!/bin/sh 
firandom - 

firandom - 
firandom - 



-size 256,256 \ 

-list "f=3.2,500*[x=g(0,0.2) ,y=g(0,0.2) ,in=15-5*r(0,l)-2] " \ 
-list "f=3.2,1400*[x=r(-l,l) ,y=r(-l,l) ,m=15+l . 38*log(r (0 , 1) ) ] " \ 

-sky 100 — sky-noise 10 — integral — photon-noise — bitpix -32 — output globular. fits 
-size 256,256 \ 

-list "5000* [x=r (-1,1) ,y=r (-1 , 1) , s=l . 3 , d=0 . 3* (x*x-y*y) ,k=0 . 6*x*y ,m=15+l . 38*log(r (0 , 1) )] " \ 
-sky 100 — sky-noise 10 — integral — photon-noise — bitpix -32 — output coma. fits 

-size 256,256 \ 

—list "f=3.0,100*[X=36+20*div(n,10)+r(0,l) , Y=36+20*mod(n, 10) +r (0 , 1) ,m=10] " \ 

— sky " 100+x*10-y*20" — sky-noise 10 — integral — photon-noise — bitpix -32 — output grid. fits 

for base in globular coma grid ; do 

fiinfo ${base}.fits — pgm linear ,zscale — output-pgm - I pnmtoeps -g -4 -d -o ${base}.eps 

done 



Figure 20. Three mock images generated using the program firandom. The first image (globular. fits) on the left shows a "globular 
cluster" with some field stars as well. For simplicity, the distribution of the cluster stars are Gaussian and the magnitude distribution is 
quadratic while the field stars distribute uniformly and their magnitudes is derived from assuming uniformly distributed stars of constant 
brightness. The second image (coma. fits) simulates nearly similar effect on the stellar profiles what comatic aberration would cause. The 
shape parameters 5 and k (referred as d and k in the command line argument of the program, see also Sec. 12. 4.21 1 are specific functions 
of the spatial coordinates. The magnitude distribution of the stars is the same as for the field stars in the previous image. The third 
image (grid. fits) shows a set of stars positioned on a grid. The background of this image is not constant. The shell script below the 
image stamps is used to create these FfTS files. The body of the last iterator loop in the script converts the FITS files into PGM fo rmat, 
using the fiinfo utility (see Sec. 12. 12.31 1 and the well-known zscale intensity scaling algorithm (see DS9. [jove &: Mandell |2003| V The 
images yielded by fiinfo are instantly converted to EPS (encapsulated Postscript) files, that is the preferred format for many typesetting 
systems, such as IMjTjX. 
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Figure 21. Vector plots of the difference between the transformed 
reference and the input star coordinates for a typical HAT field. 
The left panel shows the difference for second-order, the right 
panel for fourth-order polynomial fits. 
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Figure 22. The difference between the Y coordinates of the 
transformed reference and the input star coordinates for a typical 
HAT field. The left panel shows the difference for fourth-order, 
the right panel for sixth-order polynomial fits. 



2.12.10 Matching lists or catalogues - grmatch 

The main purpose of the grmatch code is to implement the 

point matching algorithm that is the key point in the deriva- 
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tion of the astrometric solution and source identification. See 
Section 12.51 about more details on the algorithm itself. We 
note here that although the program grmatch is sufficient 
for point matching and source identification purposes, but 
one needs other codes to interpret or use the outcome of this 
program. For instance, tabulated list of coordinates can be 
transformed from one reference frame to another, using the 
program grtrans while the program f itrans is capable to 
apply these transformations (yielded by grmatch) on FITS 
images, in order to, for instance, register images to the same 
reference frame. 

2.12.10.1 Typical applications As it was discussed 
before, the programs grmatch and grtrans are involved in 
the photometry pipeline, following the star detection. If the 
accuracy of the coordinates in the reference catalogue is suf- 
ficient to yield a consistent plate solution, one can obtain the 
photometric centroids by simply invoking these programs. A 
more sophisticated example for these program is shown in 
Fig. 1231 In this example these programs are invoked twice in 
order to both derive a proper astrometric solution'*'^ and 
properly identify the stars with larger proper motions'*''. 
Such iterative invocation scheme is used frequently in case of 
the reduction of follow-up photometry data (see Chapter |4] 
and Sec. 14.11 for some other practical details). The simple 
direct application of grmatch and grtrans as a part of a 
complete photometric pipeline is displayed in Fig. 1291 



2.12.11 Transforming and registering images - fitrans 

As it was discussed earlier (Sec. I2.8|l . the image convolu- 
tion and subtraction process requires the images to be in 
the same spatial reference system. The details of this reg- 
istration process have been explained already in Sec. 12.61 
The purpose of the program fitrans is to implement these 
various image interpolation methods. 

In principle, fitrans reads an image and a transforma- 
tion file, performs the spatial transformation and writes the 
output image to a separate file. Image data are read from 
FITS files while the transformation files are presumably de- 
rived from the appropriate astrometric solutions. The out- 
put of the grmatch and grtrans programs can be directly 
passed to fitrans. Of course, fitrans takes into account 
the masks associated to the given image as well as derive the 
appropriate mask for the output file. Pixels which cannot be 
mapped from the original image have always a value of zero 
and these are marked as outer pixels (see also Sec. 12.3.^ . 

In the HATNet data reduction, this spatial transforma- 
tion requires significant amount of CPU time since the exact 
integration on biquadratic interpolation surfaces is a compu- 
tationally expensive process (Sec. 12.6311 . However, distinct 
image transformations can be performed independenlty (i.e. 
a given transformation does not have any infiuence on an- 
other transformations), thus the complete registration pro- 
cess can easily be performed in parallel. 



^■^ By taking into account only the stars with negligible proper 
motion. 

That would otherwise significantly distort the astrometric so- 
lution. 



2.12.12 Convolution and image subtraction - ficonv 

This member of the fi/fihat package is intended to im- 
plement the tasks related to the kernel fit, image convo- 
lution and subtraction. In principle, ficonv has two basic 
modes. First, assuming an existing kernel solution, it eval- 
uates equation (|7ip on an image and writes the convolved 
result to a separate image file. Second, assuming a base set 
of kernel functions (equation I73|) and some model for the 
background variations (equation I75|l it derives the best fit 
kernel solution for equation (|7ip . described by the coeffi- 
cients Cike and Bke, respectively. Since this fit yields a linear 
equation for these coefficients, the method of classic linear 
least squares minimization can be efficiently applied. How- 
ever, the least squares matrix can have a relatively large 
dimension in the cases where the kernel basis is also large 
and/or higher order spatial variations are allowed. In the fit 
mode, the program yields the kernel solution, and optionally 
the convolved {C = B + R-k K) and the subtracted resid- 
ual image {S — I — C) can also be saved into separate files 
without additional invocations of ficonv and/or f iarith. 

The program ficonv also implements the fit for cross- 
convolution kernels (equation I79|l . In this case, the two ker- 
nel solutions are saved to two distinct files. Subsequent in- 
vocations of ficonv and/or fiarith can then be used to 
analyze various kinds of outputs. 

In Sec. 12.91 we were discussing the relevance of the ker- 
nel solution in the case when the photometry is performed 
on the residual (subtracted) images. The best fit kernel so- 
lution obtained by ficonv has to be directly passed to the 
program f iphot (Sec. l2.12.13p in order to properly take into 
account the convolution information during the photometry 
(equation I83p. 

2.12.13 Photometry - f iphot 

The program f iphot is the main code in the fi/fihat pack- 
age that performs the raw and instrumental photometry. In 
the current implementation, we were focusing on the aper- 
ture photometry, performed on normal and subtracted im- 
ages. Basically, f iphot reads an astronomical image (FITS 
file) and a centroid list file, where the latter should contain 
not only the centroid coordinates but the individual object 
identifiers as well'*^. 

In case of image subtraction-based photometry, f iphot 
requires also the kernel solution (derived by ficonv). Oth- 
erwise, if this information is omitted, the results of the pho- 
tometry are not reliable and consistent. See also Sec. l2.9l for 
further details about this issue. 

In Fig. 1291 a complete shell script is displayed, as an 
example of various fi/fihat programs related to the pho- 
tometry process. 

Currently, PSF photometry is not implemented di- 
rectly in the program f iphot. However, the program f istar 
(Sec. I2.12.8p is capable to do PSF fitting on the detected 
centroids, although its output is not compatible with that of 
f iphot. Alternatively, If it (see Sec. l2.12.16p can be used to 
perform profile fitting, if the pixel intensities are converted 

*^ If the proper object identification is omitted, f iphot assigns 
some arbitrary (but indeed unique) identifiers to the centroids, 
however, in practice it is almost useless. 
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for base in ${LIST_OF_FRAMES [*] } ; do 




fTT'TTl 51 "1" /*"h 
^J. illCL U ^11 


— reference $CATALOG — col-ref $COL_X,$CQL_Y — col-ref -ordering - 


-$COL_MAG \ 




— input $AST/$base . stars — col-inp 2,3 — col-inp-ordering +8 \ 






— weight reference, coluiiin=$CQL_MAG, magnitude, power=2 \ 






— order $AST_ORDER — max-distance $MAX_DISTANCE \ 






— output-transformation $AST/$base . trans — output $AST/$base .match I I break 


grtrans 


$CATALOG \ 






— col-xy $COL_X,$COL_Y — input-transformation $AST/$base . trans \ 






— col-out $CDL_X,$CDL_Y — output - I \ 




grmatch 


— reference - — col-ref $COL_X,$COL_Y — input $AST/$base . stars — 


-col-inp 2,3 \ 




—match-coords — max-distance $MAX_MATCHDST — output - I \ 




grtrans 


— col-xy $COL_X,$COL_Y — input-transformation $AST/$base .trans — 


-reverse \ 




— col-out $CDL_X,$CQL_Y — output $AST/$base .match 




done 







Figure 23. A typical application for the grmatch - grtrans programs, for the cases where a few of the stars have high proper motion 
thus have significant offsets from the catalogue positions. For each frame (named $base), the input catalogue ($CATALOG) is matched 
with the respective list of extracted stars (found in the $AST/$base . stars file), keeping a relatively large maximum distance between the 
nominal and detected stellar positions ($MAX_DISTANCE, e.g. 4 — 6 pixels, derived from the expected magnitude of the proper motions from 
the catalogue epoch and the approximate plate scale). This first initial match identifies all of the sources (including the ones with large 
proper motion), stored in $AST/$base .match file in the form of matched detected source and catalogue entries. However, the astrometric 
transformation (stored in $AST/$base. trans) is systematically affected by these high proper motion stars. In order to get rid of this 
effect, the match is performed again by excluding the stars with higher residual distance (by setting $MAX_MACHDIST to e.g. 1 — 2 pixels). 
The procedure is then repeated for all frames (elements of the $LIST_OF_FRAMES [] array) in the similar manner. 



SELF=$0; base="$l" 

if [ -n "$base" ] ; then 

fitrans ${FITS}/$base . f its \ 

— input-transformation ${AST}/$base . trans — reverse -k -o ${REG}/$base-trans . f its 

else 

pexec -f BASE. list -e base -o - -u - -c — "$SELF \$base" 

f i 



SELF= 


■$0; base="$l" 






if [ 


-n "$base" ] ; then 








KERNEL=" i/4 ; b/4 ; d=3/4" 








ficonv — reference . /photref . f its \ 








— input ${REG}/$base-trans.f its - 


-input-stamps . /photref . reg 


—kernel "$KERNEL" \ 




— output-kernel-list ${AST}/$base 


kernel — output-subtracted 


${REG}/$base-sub.f its else 




pexec -f BASE. list -e base -o - -u - -c - 


- "$SELF \$base" 




f i 









Figure 24. Two shell scripts demonstrating the invocation syntax of the fitrans and ficonv. Since the computation of the transformed 
and convolved images require significant amount of CPU time, the utility pexec (http://shellpexec.sf.net) is used to run the jobs in 
parallel on multiple CPUs. 



to ASCII tables in advance , however, it is not computa- 
tionally efficient. 

2.12.14 Transposition of tabulated data - grcollect 

Raw and instrumental photometric data obtained for each 
frame are stored in separate files by default as it was dis- 

The program fiinfo is capable to produce such tables with 
three columns: a list of x and y coordinates followed by the re- 
spective pixel intensitie. 



cussed earlier (see Sec. [121 Sec. [Ij] and Sec. I2.12.13() . We 
refer to these files as photometric files. In order to analyze 
the per-object outcome of our data reductions, one has to 
have the data in the form of light curve files. Therefore, the 
step of photometry (including the magnitude transforma- 
tion) is followed immediately by the step of transposition. 
See Fig. [25] about how this step looks like in a simple case 
of 3 photometric files and 4 objects. 

The main purpose of the program grcollect is to per- 
form this transposition on the photometric data in order 
to have the measurements being stored in the form of light 
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grcollect ${PHQT}/IMG-* .phot — col-base 2 — prefix ${LC}/ — extension Ic — max-memory 256m 

cat ${PHDT}/IMG-* .phot I grcollect - — col-base 2 — prefix ${LC}/ — extension Ic — max-memory 256m 



Figure 25. The schematics of the data transposition. Records for individual measurements arc written initially to photometry files 
(having an extension of *.phot, for instance). These records contain the source identifiers. During the transposition, photometry files are 
converted to light curves. In principle, these light curves contain the same records but sorted into distinct files by the object names, not 
the frame identifiers. The command lines on the lower panel show some examples how this data transposition can be employed involving 
the program grcollect. 



curves and therefore to be adequate for further per-object 
analysis (such as light curve modelling) . The invocation syn- 
tax of grcollect is also shown in Fig. 1251 Basically, small 
amount of information is needed for the transposition pro- 
cess; the name of the input files, the index of the column 
in which the object identifiers are stored and the optional 
prefixes and/or suffixes for the individual light curve file 
names. The maximum memory that the program is allowed 
to use is also specified in the command line argument. In 
fact, grcollect does not need the original data to be stored 
in separate files. The second example on Fig. [23 shows an 
alternate way of performing the transposition, namely when 
the whole data is read from the standard input (and the 
preceding command of cat dumps all the data to the stan- 
dard output, these two commands are connected by a single 
uni-directional pipe). 



The actual implementation of the transposition inside 
grcollect is very simple: it reads the data from the in- 
dividual files (or from the standard input) until the data 
fit in the available memory. If this temporary memory is 
full of records, this array is sorted by the object identifier 
and the sorted records are written/concatenated to distinct 
files. The output files are named based on the appropriate 
object identifiers. This procedure is repeated until there are 
available data. Although this method creates the hght curve 
files, it means that neither the whole process nor the ac- 
cess to these hght curve files is effective. In case of HATNet, 
when we have thousands of frames in a single reduction and 
there are several tens or hundreds of thousands individual 
stars that are intended to have photometric measurements 
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Figure 26. Storage schemes for photometric data. Supposing a 
series of frames, on which nearly the same set of stars have in- 
dividual photometric measurements, the figure shows how these 
data can be arranged for practical usages. The target stars (their 
identifiers) are arranged along the abscissa while the ordinate 
shows the frame identifiers to which individual measurements 
(symbolized by dots) belong. Raw and instrumental photomet- 
ric data are therefore represented here as rows (see the marked 
horizontal stripe for frame #3, for instance) while the columns 
refer to light curves. In practice, native ways of transposition are 
extremely ineffective if the total amount of data does not fit into 
the memory. The transposition can be speeded up by using an 
intermediate stage of data storage, so-called macroblocks. In the 
figure, each macroblock is marked by an enclosing rectangle. See 
text for further details. 

and each record is quite long*^, the total amount of data 
A record for a single photometric measurement is several hun- 
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is in the order of hundreds of gigabytes. For even modern 
present-day computers, such a large amount of data does 
not fit in the memory. Therefore, referring to the simple 
process discussed above, light curve files are not written to 
the disk at once but in smaller chunks. These chunks are 
located on different cylinders of the disk; files are therefore 
extremely fragmented. Both the creation and the access of 
these fragmented files are extremely inefficient, since frag- 
mented files require additional highly time-consuming disk 
operations such as random seeks between cylinders. In prac- 
tice, even on modern computers (being used by the project), 
the whole process requires a day or so to be completed, 
although the sequential access to some hundreds of giga- 
bytes of data would require only an hour or a few hours 
(with a plausible I/O bandwidth of ~ 50MB/sec). In or- 
der to overcome this problem, one can either use an ex- 
ternal database engine that features optimizations for such 
two-dimensional queries or tweak the above transposition 
algorithm to avoid unexpected and/or expensive disk op- 
erations. Now we briefly summarize an approach how the 
transposition can be made more effective if we consider some 
assumptions for the data structure. The program grcollect 
is capable to do transpositions even if some of the keys (stel- 
lar identifiers) are missing or if there are more than one 
occurrences for a single key in a given file. Let us assume 
that 1) in each input file every stellar identifier is unique 
and 2) the number of missing keys is negligible compared to 
the total number of photometric data records*®. Assuming 
a total of A^F frames and A* unique stellar identifiers (in 
the whole photometric data), the total number of records is 
Ar < ApA*. The total memory capacity of the computer is 
able to store M records simultaneously. Let us denote the 
average disk seek time by t and the sequential access speed 
by u) (in the units of records per second). The transposi- 
tion can then be performed effectively in two stages. In the 
first stage the photometry files are converted to individual 
files, so-called macroblocks, where each of them is capable to 
store (M/Ap) x (A//A*) records, each macroblock represent 
a continuous rectangle in the stellar identifier - frame space 
(see Fig. [26}. In the second stage, macroblock files are con- 
verted into light curves. Due to the size of the macroblock, 
MAp/A* photometric files can be read up sequentially and 
stored in the memory at the same time. If the relation 



is true for the actual values of M, Af, A*, w and r, the mac- 
roblocks can be accessed randomly after the first stage (in- 
dependently from the order in which they have been written 
to the disk), without too much dead time due to the ran- 
dom seeks. Therefore, at the second stage when macroblocks 
are read in the appropriate order of the stellar identifiers, 
MA*/Ap light curves can be flushed simultaneously without 
any additional disk operations beyond sequential writing. 

dreds of bytes long since it contains information for multiple aper- 
tures (including flux error estimations and quality flags) as well 
as there are additional fields for the stellar profile parameters and 
other observational quantities used in further trend filtering. 

Each record represents a single photometric measurement for 
a single instant, including all additional relevant data (such as 
the parameters involved in the EPD analysis, see earlier) 



In the case of the computers used in HATNet data re- 
duction, M ^ fO^ Af 10'*, Av, 7^ 10^ Lo ^ 10^ records/sec 
and r ~ 10^^ sec, the right-hand side of equation (|93|l is 
going to be ~ lO'^, so the discussed way of two-stage trans- 
position is very efficient. Indeed, the whole operation can 
be completed within 3 — 5 hours, instead of a day or few 
days that is needed by the normal one-stage transposition. 
Moreover, due to the lack of random seeks, the computer 
itself remains responsible for the user interactions. In the 
case of one-stage transposition, the extraordinary amount 
of random seeks inhibit almost any interactive usage. 

2.12.15 Archiving - fizip and fiunzip 

Due to the large disk space required to store the raw, 
calibrated and the derived (registered and/or subtracted) 
frames, it is essential to compress and archive the image 
files that are barely used. The purpose of the fizip and 
fiunzip programs is to compress and decompress primary 
FITS data, by keeping the changes in the primary FITS 
header to be minimal. The compressed data is stored in a 
one-dimensional 8 bit (BITPIX=8, NAXIS=l) array, therefore 
these keywords does not refiect the original image dimension 
or data type. 

All of the other keywords are untouched. Some aux- 
iliary information on the compression is stored in the key- 
words starting with "FIZIP" , the contents of these keywords 
depend on the involved compression method, fizip rejects 
compressing FITS file where such keywords exist in the pri- 
mary header. 

In practice, fizip and fiunzip refer to the same pro- 
gram (namely, fiunzip is a symbolic link to fizip) since the 
algorithms involved in the compression and decompression 
refer to the same codebase or external library, fizip and 
fiunzip support well known compression algorithms, such 
as the GNU zip ( "gzip" ) and the block-sorting file compres- 
sor (also known as "bzip2") algorithm. 

These compression algorithms are lossless. However, 
fizip supports rounding the input pixel values to the near- 
est integer or to the nearest fraction of some power of 2. 
Since the common representation of fioating-point real num- 
bers yields many zero bits if the number itself is an integer 
or a multiple of power of 2 (including fractional multiples), 
the compression is more effective if this kind of rounding 
is done before the compression. This "fractional rounding" 
yields data loss. However, if the difference between the orig- 
inal and the rounded values are comparable or less than the 
readout noise of the detector, such compression does not af- 
fect the quality of the further processing (e.g. photometry). 

2.12.16 Generic arithmetic evaluation, regression and 
data analysis - I fit 

Modeling of data is a prominent step in the analysis and 
interpretation of astronomical observations. In this section, 
a standalone command line driven tool, named If it is in- 
troduced, designed for both interactive and batch processed 
regression analysis as well as generic arithmetic evaluation. 
This tool is built on the top of the libpsn library*^, 

*^ http://libpsn.sf.net, developed by the author 
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Table 6. Algorithms supported by If it and their respective requirements for the model function. The first column refers to the internal 
and command line identifier of the algorithms. The second column shows whether the method requires the parametric derivatives of 
the model functions in an analytic form or not. The third column indicates whether in the cases when the method requires parametric 
derivatives, should the model function be linear in all of the parameters. 



Code 


derivatives 


linearity 


Method or algorithm 


L/CLLS 


yes 


yes 


Classic linear least squares method 


N/NLLM 


yes 


no 


(Nonlinear) Levenberg-Marquardt algorithm 


U/LMND 


no 


no 


Levcnberg-Marquardt algorithm employing numeric parametric derivatives 


M/MCMC 


no 


no 


Classic Markov Chain Monte-Carlo algorithm* 


X/XMMC 


yes 


no 


Extended Markov Chain Monte-Carlo^ 


K/MCHI 


no 


no 


Mapping the values on a grid (a.k.a. "brute force" minimization) 


D/DHSX 


optional^ 


no 


Downhill simplex 


E/EMCE 


optional* 


optional"* 


Uncertainties estimated by refitting to synthetic data sets 


A/FIMA 


yes 


no 


Fisher Information Matrix Analysis 



The implemented transition function is based on tlie Metropolitan-Hastings algorithm and the optional Gibbs sampler. The transition amplitudes must 
be specified initially. Iterative MCMC can be implemented by subsequent calls of Ifit, involving the previous inverse statistical variances for each parameters 
as the transition amplitudes for the next chain. 

^ The also program reports the summary related to the sanity checks (such as correlation lengths, Fisher covariance, statistical covariancc, transition 
probabilities and the best fit value obtained by an alternate /usually the downhill simplex/ minimization). 

^ The downhill simplex algorithm may use the parametric derivatives to estimate the Fisher/covariance matrix for the initial conditions in order to define 
the control points of the initial simplex. Otherwise, if the parametric derivatives do not exist, the user should specify the "size" of the initial simplex somehow 
in during the invocation of Ifit. 

^ Some of the other methods (esp. CLLS, NLLM, DHSX, in practice) can be used during the minimization process of the orignal data and the individual 
synthetic data sets. 



a collection of functions managing symbolic arithmetic ex- 
pressions. This library provides both the back-end for func- 
tion evaluation as well as analytical calculations of partial 
derivatives. Partial derivatives are required by most of the 
regression methods (e.g. linear and non-linear least squares 
fitting) and uncertainty estimations (e.g. Fisher analysis). 
The program features many built-in functions related to spe- 
cial astrophysical problems. Moreover, it allows the end-user 
to extend the capabilities during run-time using dynamically 
loaded libraries. 

In general, Ifit is used extensively in the data re- 
duction steps of the HATNet project. The program acts 
both in the main "discovery" pipeline and it is involved 
in the characterization of follow-up data, including photo- 
metric and radial velocity measurements. Currently, Ifit 
implements executively the EPD algorithm (including the 
normal, the reconstructive and the simultaneous modes) 
as well as the simultaneou s TFA algorithm (see e.g. 
iBakos. Torres. Pal et aLll2009l) . 



2.12.16.1 User interface and built-in regression 
methods Due to the high modularization and freedom 
in its user interface, the program Ifit allows the user to 
compare the results of different regression analysis tech- 
niques. The program features 9 built-in algorithms at the 
mom ent, including the classic linear least squares minimiza- 
tion (|Press et al.lll993 ). the non-line ar methods (Leven berg- 
Marquard, downhill simplex, see also lPress et al.|[l992l ). var- 
ious methods providing an a posteriori distribution for the 
adjusted pa rameters, such as Markov Chain Monte-Carlo 
(|Ford[ 2(304|'). or the m ethod of refitting to synthetic data 
sets (Press et al.lll992h . The program is also capable to de- 
rive the covariance or correlation matrix of the param eters 
involving the Fisher information analysis (|FinDlll992l ). The 
comprehensive list of the supported algorithms can be found 
in Table H 

The basic concepts of Ifit is shown in Fig.[27]in a form 
of a complete example for linear regression. 



2.12.16.2 Built-in functions related to astronomi- 
cal data analysis The program Ifit provides various 
built-in functions related to astronomical data analysis, es- 
pecially ones that are required by exoplanetary research. All 
of these functions are some sort of "base functions" , with a 
few parameters from which one can easily form more useful 
ones using these capabilities of Ifit. Good examples are the 
eccentric offset functions p(A, fc, /i) and q(A, fc, ft) (Sec. 14.3]) . 
that have only three parameters but the functions related to 
the radial velocity analysis can easily be defined using these 
two functions. The full list of these special functions can be 
found in Table [T] The actual implementation of the above 
mentioned radial velocity model functions can be found in 
Chapter H in Fig.[39l 



2.12.16.3 Extended Markov Chain Monte-Carlo 

In this section we discuss in more details one of the built-in 
methods, that combines a Markov Chain Monte-Carlo algo- 
rithm with the parametric derivatives of the model functions 
in order to yield faster convergence and more reliable results, 
especially in the cases of highly correlated parameters. 

The m ain concept of the MCMC algorithm (see e.g. 
lFordll2004l ). is to generate an a posteriori probability distri- 
bution of the adjusted parameters. It is based on random 
walks in the parameter space as follows. In each step, one 
draws an alternate parameter vector from an a priori distri- 
bution and then evaluates the merit function x'^- If the value 
of the decreases, we accept the transition (since the newly 
drawn parameter vector represents a better fit), otherwise 
the transition is accepted by a certain probability (derived 
from the increment in x^)- The final distribution of the pa- 
rameters depends on both the a priori distribution and the 
probability function used when the value of the Ax^ is pos- 
itive. The main problem of the MCMC method is that the 
a posteriori probability distribution can only be estimated 
if the a priori distribution is chosen well, but initially we do 
not have any hint for both distributions. The idea behind 
MCMC is to derive multiple chains, by taking the a posteri- 
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Table 7. Basic functions found in the built-in astronomical extension library. These functions cover the fields of simple radial velocity 
analysis, some aspects of light curve modelling and data reduction. These functions are a kind of "common denominators" , i.e. they do 
not provide a direct possibility for applications but complex functions can be built on the top of them for any particular usage. All of 
the functions below with the exception of hjd() and bjd() have partial derivatives that can be evaluated analytically by If it. 



Function Description 



hjd(JD,a, (5) Function that calculates the heliocentric Julian date from the Julian day J and the celestial coordinates a (right 

ascension) and 5 (declination). 

bjd(JD,a, (5) Function that calculates the barycentric Julian date from the Julian day J and the celestial coordinates a (right 

ascension) and & (declination). 
ellipticK(A;) Complete elliptic integral of the first kind. 

ellipticE(A;) Complete elliptic integral of the second kind. 

elllpticPi(A:, n) Complete elliptic integral of the third kind. 

eoq(A, k, h) Eccentric offset function, 'q' component. The arguments are the moan longitude A, in radians and the Lagrangian 

orbital elements k = ecosro, h = esmvj. 
eop(A, k, h) Eccentric offset function, 'p' component. 

ntiu(p, z) Normalized occultation flux decrease. This function calculates the flux decrease during the eclipse of two spheres 

when one of the spheres has uniform flux distribution and the other one by which the former is eclipsed is totally 
dark. The bright source is assumed to have a unity radius while the occulting disk has a radius of p. The distance 
between the centers of the two disks is z. 

ntiq(p, z, 7i , 72) Normalized occultation flux decrease when eclipsed sphere has a non-uniform flux distribution modelled by 
quadratic limb darkening law. The limb darkening is characterized by 71 and 72. 



ori distribution of the previous chain as the input (a priori) 
distribution for the upcoming chain. In regular cases, the 
chains converge to a final distribution after some iterations 
and therefore the last one can be accepted as a final result. 
In the literature, several attempts ar e known to define an 
a priori transition function (see also iFordI l2004l ) . Here we 
give a simple method that not only provides a good hint 
for the a priori distribution but yields several independent 
sanity checks that are then used to verify the convergence 
of the chain. The transition function used by this extended 
Markov Chain Monte-Carlo algorithm (XMMC) is a Gaus- 
sian distribution of whic h covariance s are derived from the 
Fisher covariance matrix (|Finnlll992l ). The sanity checks are 
then the following: 

• The resulted parameter distribution should have nearly 
the same statistical covariance as the analytical covari- 
ance^°. 

• The autocorrelation lengths of the chain parameters 
have to be small, i.e. nearly ~ 1 — 2 steps. Chains failed to 
converge have significantly larger autocorrelation lengths. 

• The transition probability has to be consistent with 
the theoretical probabilities. This theoretical probability de- 
pends only on the number of adjusted parameters. 

• The statistical centroid (mode) of the distribution must 
agree with both the best fit parameter derived from alternate 
methods (such as downhill simplex) as well as the chain 
element with the smallest x^- 

The method of XMMC has some disadvantages. First, the 
transition probabilities exponentially decrease as the num- 
ber of adjusted parameters increases, therefore, the required 
computational time can be exceptionally high in some cases. 
The Gibbs sampler (used in the classic MCMC) provides 



^" In practice, the program If it reports the individual uncer- 
tainties of the parameters and the correlation matrix. Of course, 
this information can easily be converted to a covariance matrix 
and vice versa. 



roughly constant transition probability. Second, the deriva- 
tion of the Fisher covariance matrix requires the knowledge 
of the parametric derivatives of the merit function. In the 
actual implementation of If it, XMMC one can use the 
method of XMMC if the parametric derivatives are known 
in advance in an analytical form. Otherwise, the XMMC 
algorithm cannot be applied at all. 

However, in the case of HATNet data analysis, we found 
the method of XMMC to be highly efficient and we used 
it in several analyses related to the discoveries. Moreover, 
the most important functions concerning to this analysis, 
such as light curve and radial velocity model functions have 
known analytic partial derivatives. These derivatives for 
transi t light curve model functions can be found in iPall 
(|2008h . An analytic formalism for radial velocity modelling 
is discussed in Sec. 14.31 and some additional related details 
and applications are presented in iPall (|2009l ). In this thesis 
(in Chapter[3]) a detailed example is given on the application 
of the XMMC algorithm in the analysis of the HAT-P-7(b) 
planetary system. 



2.13 Analysis of photometric data 

In this section we describe briefly how the previously dis- 
cussed algorithms and the respective implementations are 
used in the practice of photometric data reduction. The con- 
cepts for the major steps in the photometry are roughly the 
same for the HATNet and follow-up data, however, the lat- 
ter has two characteristics that make the processing more 
convenient. First, the total amount of frames are definitely 
smaller, a couple of hundred frames for a singe night or 
event, while there are thousands or tens of thousands of 
frames for a typical observation of a certain HATNet field. 
Second, the number of stars on each individual frame is also 
smaller (a few hundred instead of tens or hundreds of thou- 
sands). Third, during the reduction of follow-up photometric 
data, we have an expectation for the signal shape. The signal 
can be easily obtained even by lower quality of data and/or 
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# 


This commanc 


just prints the content of the file ''line.dat'' to the standard output: 


$ 


cat line.dat 




2 


8.10 




3 


10.90 




4 


14.05 




5 


16.95 




6 


19.90 




7 


23.10 




# 


Regression: 


this command fits a ''straight line'' to the above data: 


$ 


Ifit -c x,y 


-V a,h -f "a*x+i)" -y y line.dat 




2.99714 


2.01286 


# 


Evaluation: 


this command evaluates the model function assuming the parameters to be known: 


$ 


Ifit -c x,y 


-V a=2. 99714, b=2. 01286 "/ "x,y , a*x+b ,y- (a*x+b) " -F X6.4g,X8.2f,X8.4f,X8.4f line.dat 




2 


8.10 8.0071 0.0929 




3 


10.90 11.0043 -0.1042 




4 


14.05 14.0014 0.0486 




5 


16.95 16.9986 -0.0486 




6 


19.90 19.9957 -0.0957 




7 


23.10 22.9928 0.1072 


$ 


Ifit -c x,y 


-V a,h -f "a*x+'b" -y y line.dat — err 




2.99714 


2.01286 




0.0253144 


0.121842 



Figure 27. These pieces of commands show the two basic operations of Ifit: the first invocation of Ifit fits a straight line, i.e. a model 
function with the form oi ax + b = y to the data found in the file line.dat. This file is supposed to contain two columns, one for the x 
and one for the y values. The second invocation of Ifit evaluates the model function. Values for the model parameters (a, b) are taken 
from the command line while the individual data points {x, y) are still read from the data file line.dat. The evaluation mode allows the 
user to compute (and print) arbitrary functions of the model parameters and the data values. In the above example, the model function 
itself and the fit residuals are computed and printed, following the road values of x and y. Note that the printed values are formatted for 
a minimal number significant figures (%6.4g) or for a fixed number of decimals (%8.2f or %8.4f). The last command is roughly the same 
as the first command for regression, but the individual uncertainties are also estimated by normalizing the value of the to unity. 



when some of the reduction steps are skipped (e.g. trend 
filtering or a higher order magnitude transformation). 

The schematics of a typical photometric pipeline (as 
used for HATNet data reductions) is shown in Fig. 1281 It 
is clear from the figure that the steps of the reduction are 
the same up to astrometry both in cases when the fluxes are 
derived either by normal (aperture) photometry or image 
subtraction method. In the flrst case, the astrometric solu- 
tion is directly used to compute the aperture centroids for 
all objects of interest, while in case of image subtraction, the 
image registration parameters are based on astrometry. Af- 
ter the instrumental magnitudes are obtained, the process of 
the photometric files (including transposition, trend filtering 
and per-object light curve analysis) are the same again. In 
practice, both primary photometric methods yield fluxes for 
several apertures. Therefore, joint processing of various pho- 
tometric data is also feasible since the subsequent steps do 
not involve additional information beyond the instrumental 
magnitudes. The only exception is that additional data can 
be involved in the EPD algorithm in case of image subtrac- 
tion photometry. Namely, the kernel coefficients dkt can be 
added to the set of EPD parameters p'*' (see equation [84]), 
by evaluating for the spatial variations of each object: 

P^''= C^uixy', (94) 

where (a;, j/) is the centroid coordinate of the actual object 
of interest. In the following two chapters, I discuss how the 
above outlined techniques are applied in the case of HATNet 
and follow-up data reductions. 



3 HATNET DISCOVERIES 



In the past few years, the HATNet project announced 11 
discoveries and became one of the most successful initia- 
tives searching for transiting extrasolar planets. In this chap- 
ter the procedures of the photometric measurements and 
analysis of spectroscopic data (including radial velocity are 
explained, emphasizing how the algorithms and programs 
were used in the data reduction and analysis. The partic- 
ular example of the planetary system HAT-P-7(b) clearly 
demonstrates all of the necessary steps that are generally 
required by the detection and confirmation of transiting ex- 
trasolar planets. In Sec. 13. II the issues related to the primary 
photometric detection are explained. Sec 13.21 summarizes 
the follow-up observations, which are needed by the proper 
confirmation of the planetary nature. Mainly, the roles of 
these photometric follow-up observations are treefold. First, 
it provides additional data in order to have a better esti- 
mation of the planetary parameters whose are derived from 
the fight curve of the system. Like so, spectroscopic analysis 
yields additional information from which the planetary mass 
or the properties and physical parameters of the host star 
can be deduced. Third, analysis of follow-up data helps to 
exclude other scenarios that are likely to show similar pho- 
tometric or spectroscopic variations what a transiting ex- 
trasolar planet shows. In Sec 13. 31 the methods are explained 
that we were using to obtain the final planetary parameters. 
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#!/bin/sh 



CATALDG=input . cat 

CDLID=1 

CQLX=2 

CQLY=3 

CDLMAG=4 

CDLC0L0R=5 

THRESH0LD=4000 

GAIN=4.2 

MAGFLUX=10, 10000 
APERTURE=5 : 8 : 8 



# name of the reference catalog 

# column index of object identifie (in the $CATALOG file) 

# column index of the projected X coordinate (in the $CATALQG file) 

# column index of the projected Y coordinate (in the $CATALQG file) 

# column index of object magnitude (in the $CATALOG file) 

# column index of object color (in the $CATALOG file) 

# threshold for star detection 

# combined gain of the readout electronics and the A/D converter in electrons/ADU 

# magnitude/flux conversion 

# aperture radius, background area inner radius and thickness (all in pixels) 



mag_param=cO_00 , cO_10 , cO_01 , c0_20 , cO_ll , c0_02, cl_00, cl_01 , cl_10 

mag_funct="c0_00+c0_10*x+c0_01*y+0.5*(c0_20*x"2+2*c0_ll*x*y+c0_02*y"2)+color*(cl_00+cl_10*x+cl_01*y)" 

for base in ${LIST[*]} ; do 

fistar ${FITS}/$base.f its — algorithm uplink — prominence 0.0 — model elliptic \ 

— flux-threshold $THRESHOLD — format id,x,y,s,d,k,amp,f lux -o ${AST}/$base . stars 
grmatch — reference $CATALOG — col-ref $COLX,$COLY — col-ref -ordering -$COLMAG \ 

— input ${AST}/$base . stars — col-inp 2,3 — col-inp-ordering +8 \ 

— weight reference , column=$CQLMAG, magnitude ,power=2 \ 

— triangulation maxinp=100 ,maxref =100 , conformable , auto ,unitarity=0 . 002 \ 
— order 2 — max-distance 1 \ 

— comment — output-transformation ${AST}/$base . trans I I continue 
grtrans $CATALOG — col-xy $COLX,$COLY — col-out $COLX,$COLY \ 

— input-transformation ${AST}/$base . trans — output - I \ 
fiphot ${FITS}/$base.f its — input-list - — col-xy $COLX,$COLY — col-id $CQL1D \ 

— gain $GAIN — mag-flux $MAGFLUX — aperture $APERTURE — disjoint-annuli \ 

— sky-fit mode,it6rations=4,sigma=3 — format IXY,MmBbS \ 

— comment — output ${PHOT}/$base.phot 
paste ${PHOT}/$base.phot ${PHQT}/$REF.phot $CATALOG I \ 

Ifit — columns mag: 4,err : 5 ,magO : 12 ,x : 10 ,y : 11 , color : $ ( (2*8+C0LC0L0R) ) \ 

— variables $mag_param — function "$mag_funct" — dependent magO-mag — error err \ 
—output-variables ${PHOT}/$base . coef f 

paste ${PHOT}/$base.phot ${PHDT}/$REF.phot I \ 

Ifit — columns mag: 4,err : 5 ,magO : 12 ,x : 10,y : 11 , color : $ ( (2*8+C0LC0L0R) ) \ 
— variables $(cat ${PHQT}/$base . coef f ) \ 

— function "mag+($mag_f unct) " — format y,9.5f — column-output 4 I \ 
awk '{ print $1, $2, $3, $4, $5, $6, $7, $8; }' > ${PHOT}/$base.tphot 

done 

for base in ${LIST[*]} ; do test -f ${PHOT}/$base.tphot kk cat ${PHQT}/$base . tphot ; done I \ 
grcollect - — col-base 1 — prefix $LC/ — extension .Ic 



Figure 29. A shell script demonstrating a complete working pipeline for aperture photometry. The input FITS files are read from 
the directory ${FITS} and their base names (without the *.fits extension) are supposed to be listed in the array ${L1ST[*]}. These 
base names are then used to name the files storing data obtained during the reduction process. Files created by the subsequent calls 
of the fistar and grmatch programs are related to the derivation of the astrometrie solution and the respective files are stored in the 
directory ${AST}. The photometry centroids are derived from the original input catalog (found in the file $CATALOG) and the astrometrie 
transformation (plate solution, stored in the *. trans) files. The results of the photometry are put into the directory ${PHOT}. Raw 
photometry is followed by the magnitude transformation. This branch involves additional common UNIX utilities such as paste and awk 
in order to match the current and the reference photometry as well as to filter and resort the output after the magnitude transformation. 
The derivation of the transformation coefficients is done by the Ifit utility, that involves $mag_funct with the parameters listed in 
$mag_param. This example features a quadratic magnitude transformation and a linear color dependent correction (to cancel the effects of 
the differential refraction). The final light curves are created by the grcollect utility what writes the individual files into the directory 
${LC}. 



3.1 Photometric detection 

The HATNet te l escop es HAT-7 and HAT-8 (HATNet; 
iBakos et al.ll2002l . |2004| ) observed HATNet field G154, cen- 
tered at a = 19''12™, 5 = +45°00', on a near-nightly basis 
from 2004 May 27 to 2004 August 6. Exposures of 5 min- 
utes were obtained at a 5.5-minute cadence whenever con- 
ditions permitted; all in all 5140 exposures were secured, 



each yielding photometric measurements for approximately 
33, 000 stars in the field down to 7 ~ 13.0. The field was ob- 
served in network mode, exploiting the longitude separation 
between HAT-7, stationed at the Smithsonian Astrophysical 
Observatory's (SAO) Fred Lawrence Whipple Observatory 
(FLWO) in Arizona (A = 111° W), and HAT-8, installed on 
the rooftop of SAO's Submillimeter Array (SMA) building 
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Figure 30. Light curve statistics for the field "G154", obtained by aperture photometry (left panel) and photometry based on the 
method of image subtraction (middle panel). The right panel shows the lower noise limit estimation derived from the Poisson- and 
background noise. Due to the strong vignetting of the optics, the effective gain varies across the image. Therefore, the distribution of the 
points on the right panel is not a clear thin line. Instead, the thickness of the line is approximately equivalent to a factor of ~ 2 between 
the noise level, indicating a highly varying vignetting of a factor of ~ 4. The star HAT-P-7 (GSC 03547-01402) is represented by the 
thick dot. The light curve scatter for this star has been obtained involving only out-of-transit data. This star is a prominent example 
where the method of image subtraction photometry significantly improves the light curve quality. 



atop Mauna Kea, Hawaii (A = 155° W). We note that each 
hght curve obtained by a given instrument was shifted to 
have a median value to be the same as catalogue magni- 
tude of the appropriate star, allowing to merge hght curves 
acquired by different stations and/or detectors. 

Following standard frame calibration procedures, as- 
trometry was performed as described in Sec. 12.51 and aper- 
ture photometry results (see Sec. 12.71 and Sec. I2.12.13p 
were subjected to External Parameter Decorrelation (EPD, 
Sec. [2T0|) . and also to the Trend Filtering Alg orithm ((TFA; 
see Sec. ETUl or lKwacs. Bakos fc Novedi2005l ). We searched 
the hght curves of field G 154 for box-shaped transit signal s 
using the BLS algorithm of lKovacs. Zucker fc Mazehl (|2002h . 
A very significant periodic dip in brightness was detected 
in the I « 9.85 magnitude star GSC 03547-01402 (also 
known as 2MASS 19285935-^4758102; a = 28™ 59=. 35, 
5 = -h47°58'l0".2; J2Q0Q), with a depth of ~ 7.0mmag, 
a period of P = 2.2047 days and a relative duration (first 
to last contact) of g ~ 0.078, equivalent to a duration of 
Pq ~ 4.1 hours. 

In addition, the star happened to fall in the overlapping 
area between fields G154 and G155. Field G155, centered at 
a = 19*^48'", 5 = -|-45°00', was also observed over an ex- 
tended time in between 2004 July 27 and 2005 September 20 
by the HAT-6 (Arizona) and HAT-9 (Hawaii) telescopes. We 
gathered 1220 and 10260 data-points, respectively (which in- 
dependently confirmed the transit), yielding a total number 
of 16620 data-points. 

After the ann ouncement and the publication of the 
planet HAT- P- 7b (|Pal et al.ir2008al ). all of the images for 
the fields G154 and G155 were re-analyzed by the method 
of image subtraction photometry. Based on the astromet- 



ric solution , the images were registered to the coordinate 
system of one of the images that was found to be a proper 
reference image (Sec. 12.6] ). From the set of registered frames 
approximately a dozen of them have been chosen to create 
a good signal-to-noise ratio master reference image for the 
image subtraction procedure. These frames were selected to 
be the sharpest ones, i.e. where the overall profile sharpness 
parameter, 5* (see Sec. I2.4.2p were the largest among the 
images (note that large 5* corresponds to small FWHM, i.e. 
to sharp stars) . Moreover, such images were chosen from the 
ones where the Moon was below the horizon (see also Fig. [T^ 
and the related discussion). The procedure was repeated for 
both fields G154 and G155. The intensity levels of these in- 
dividual sharp frames were then transformed to the same 
level involving the program ficonv, with a formal kernel 
size of 1 X 1 pixels (Br = 0, M'kcrnci = 1, -R''^' = S^°°^). Such 
an intensity level transformation corrects for the changes 
in the instrumental stellar brightnesses due to the varying 
airmass, transparency and background level. These images 
were then combined (Sec. I2.12.4|l in order to have a sin- 
gle master convolution reference image. This step was per- 
formed for both of the fields. The reference images were then 
used to derive the optimal convolution transformation, and 
simultaneously the residual ( "subtracted" ) images were also 
obtained by ficonv. For each individual object image, both 
the result of the convolution kernel fit and the residual im- 
age were saved to files for further processing. For the fit, we 
have employed a discrete kernel basis with the size of 7 x 7 
pixels and we let a spatial variation of 4th polynomial order 
for both the kernel parameters and the background level. 



The astrometric solutions have been already obtained at this 
point since the source identification and the centroid coordinates 
were already required earlier by aperture photometry. 
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Figure 31. Stamps showing the vicinity of the star HAT-P-7. All of the stamps have the same size, covering an area of 15.7' X 15.7' on 
the sky and centered on HAT-P-7. The left panel is taken from the POSS-1 survey (available, e.g. from the STScI Digitized Sky Survey 
web page). The middle panel shows the same area, as the HATNet telescopes see it. This stamp was cut from the photometric reference 
image (as it was used for the image subtraction process), that was derived from the ^ 20 sharpest and cleanest images of the HAT-8 
telescope. The right panel shows the convolution residual images averaged on the ~ 160 frames acquired by the HAT-8 telescope during 
the transit. The small dip at the center of the image can be seen well. Some residual structures at the positions of brighter stars also 
present. 



Due to the sharp profiles (the profile FWHMs were between 
2.0 . . . 2.4), this relatively small kernel size were sufficient for 
our purposes. The residuals on the subtracted images were 
subjected to aperture photometry, based on the considera- 
tions discussed in Sec. 12.91 For the proper image subtraction- 
based photometry, one needs to derive and to use the fiuxes 
on the reference image as well. These fiuxes were derived 
using aperture photometry, and the instrumental raw mag- 
nitudes were transformed to the catalogue magnitudes with 
a fourth order polynomial transformation. The residual of 
this fit was nearly 0.05 mags for both fields, thus the fiuxes 
of the individual stars have been well determined, and this 
transformation yielded proper reference fiuxes even for the 
faint and the blended stars. The results of the image sub- 
traction photometry were then processed similarly to the 
normal aperture photometry results (see also Fig. I28|) . and 
the respective light curves were de-trended involving both 
the EPD and TFA algorithms. 

For a comparison, the hght curve residuals for the nor- 
mal aperture photometry and the image subtraction pho- 
tometry are plotted on the left and middle panel of Fig. 1301 
In general, the image subtraction photometry yielded hght 
curve residuals smaller by a factor of ~ 1.2 — 1.5. The gain 
achieved by the image subtraction photometry is larger for 
the fainter stars. It is important to note that in the case 
of the star HAT-P-7, the image subtraction photometry im- 
proved the photometric quality^^ by a factor of ~ 1.8: the 
rms of the out-of-transit section in the aperture photome- 
try hght curve were 6.75 mmag while the image subtraction 
method yielded an rms of 3.72 mmag. The lower limit of the 
intrinsic noise of this particular star is 2.8 mmag (see also 
the right panel of Fig. I30|l . In Fig. 1311 we display some image 
stamps from the star HAT-P-7 and its neighborhood. Since 
the dip of ~ 7 mmag during the transits of HAT-P-7b is only 
~ 2 times larger than the overall rms of the hght curve, in- 
dividual subtracted frames does not significantly show the 

In the case of a star having periodic dips in its light curve, the 
scatter is derived only from the out-of-transit sections. 



"hole" at the centroid position of the star, especially because 
this weak signal is distributed among several pixels. There- 
fore, on the right panel of Fig. 1311 all of the frames acquired 
by the telescope HAT-8 during the transit have been aver- 
aged in order to show a clear visual detection of the transit. 
Albeit the star HAT-P-7 is a well isolated one, such visual 
analysis of image residuals can be relevant when the signal is 
detected for stars whose profiles are significantly merged. In 
such cases, either the visual analysis or a more precise quan- 
tification of this "negative residual" (e.g. by employing the 
star detection and characterization algorithms of Sec. 12. 4|) 
can help to distinguish which star is the variable. 

The combined HATNet light curve, yielded by the im- 
age subtraction photometry and de-trended by the EPD and 
TFA is plotted on Fig. 1321 Superimposed on these plots is 
our best fit model (see Sec. l3.3p . We note that TFA was run 
in signal reconstruction mode, i.e. systematics were itera- 
tively filtered out from the observed time series assuming 
that the unde rlying signal is a trapeze-shap ed transit (see 
Sec. 12.101 and iKovacs. Bakos fc Noved |2005| . for additional 
details). We note that fields G154 and G155 both intersec t 
the field of view of the Kepler mission (jBorucki et al.ll2007l ). 
and more importantly, HAT-P-7 lies in the Kepler field. 

3.2 Follow-up observations 

3.2.1 Reconnaissance spectroscopy 

Following the HATNet photometric detection, HAT-P-7 
(then a transit candidate) was observed sp ectroscopically 
with the CfA Digital Speedometer (DS, see iLathanilllQga ) 
at the FLWO 1.5 m Tillinghast refiector, in order to rule 
out a nu mber of blend scenarios that mimic p lanetary tran- 
sits (e.g. lBrownll2003l : lO'Donovan et al.ll2007l ), as weU as to 
characterize the stellar parameters, such as surface gravity, 
effective temperature, and rotation. Four spectra were ob- 
tained over an interval of 29 days. These observations cover 
45 A in a single echelle order centered at 5187 A, and have 
a resolving power of A/AA « 35,000. Radial velocities were 
derived by cross-correlation, and have a typical precision of 
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Figure 28. Flowchart of the typical photometric reduction 
pipeline. Each empty box represents a certain step of the data 
processing that requires non-negligible amount of computing re- 
sources. Filled boxes represent the type of data that is only used 
for further processing, thus the four major steps of the reduction 
are clearly distinguishable. See text for further details. 



Ikrns"^. Using these measurements, together with collabo- 
rators, we have ruled out an unblended companion of stellar 
mass (e.g. an M dwarf orbiting an F dwarf), since the radial 
velocities did not show any variation within the uncertain- 
ties. The mean heliocentric radial velocity of HAT-P-7 was 
measured to be —1 1 kms~^. Ba s ed on an analysis similar 
to that described in [Torres et all (|2002l ), the DS spectra in- 
dicated that the host star is a slightly evolved dwarf with 
logg = 3.5 (cgs), Tefi = 6250K and wsini 6kms~^. 



3.2.2 High resolution spectroscopy 

For the characterization of the radial velocity variations and 
for the more precise determination of the stellar parameters. 



we obtained 8 exposures with an iodine cell, plus one iodine- 
free template, using the HIRES instrument (|Vogt et al.l 
Il994h on the Keck I telescope, Hawaii, between 2007 August 
24 and 2007 September 1. The width of the spectrometer 
slit was 0'.'86 resulting a resolving power of A/AA k, 55,000, 
while the wavelength coverage was ~ 3800 - 8000 A. The 
iodine gas absorption cell was used to superimpose a dense 
forest of I2 lines on the stellar s pectrum and establish an ac- 
curate wavelength fiducial (see lMarcv fc ButleilllQO^ l. Rel- 
ative radial velocities in the S olar System barycen tric frame 
were derived as described bv iButler et al.l (|l996h . incorpo- 
rating full modeling of the spatial and temporal variations of 
the instrumental profile. The final radial velocity data and 
their errors are listed in Table |8] The folded data, with our 
best fit (see Sec. l3.3.2|l superimposed, are plotted in Fig. 136b ,. 



3.2.3 Photometric follow-up observations 

Partial photometric coverage of a transit event of HAT-P-7 
was carried out in the Sloan z-band with the KeplerCam 
CCD on the 1.2 m telescope at FLWO, on 2007 November 
2. The total number of frames taken from HAT-P-7 was 
514 with cadence of 28 seconds. During the reduction of 
the KeplerCam data, we used the following method. Af- 
ter bias and fiat calibration of the images, an astrometric 
transformation (in the form of first order polynomials) be- 
tween the ~ 450 brightest stars and the 2MASS catalog 
was derived, as described in Sec. 12.51 yielding a residual of 
~ 0.2 — 0.3 pixel. Aperture photometry was then performed 
using a series of apertures with the radius of 4, 6 and 8 pix- 
els in fixed positions calculated from this solution and the 
actual 2MASS positions. The instrumental magnitude trans- 
formation was obtained using ~ 350 stars on a frame taken 
near culmination of the field. The transformation fit was ini- 
tially weighted by the estimated photon- and background- 
noise error of each star, then the procedure was repeated by 
weighting with the inverse variance of the light curves. From 
the set of apertures we have chosen the aperture for which 
the out-of-transit (OOT) rms of HAT-P-7 was the small- 
est; the radius of this aperture is 6 pixels. The res ulted hght 
curve h as been presented in the discovery paper of lPal et al.l 
(|2008al ). More recently, in 2008 July 30, we have obtained an 
additional complete hght curve for the transit of HAT-P-7b, 
also in Sloan z-band with the KeplerCam CCD. 

The two follow-up hght curves from 2007 November 2 
and 2008 July 30 were then de-correlated against trends us- 
ing the complete data, involving a simultaneous fit for the 
light curve model function parameters and the EPD param- 
eters (see also Sec. l3.3p . These fits yielded a light curve with 
an overall rms of 1.83 mmag and 4.23 mmag for these two 
nights, respectively. In both cases, the cadence of the indi- 
vidual photometric measurements were 28 seconds. For the 
first night the residual scatter of 1.83 mmag is a bit larger 
than the expected rms of 1.5mmag, derived from the pho- 
ton noise (1.2mmag) and scintillation noise - that has an 
expected amplitude of 0.8mmag, ba sed on the obs ervational 
conditions and the calculations of lYound (| 19671 ) - possi- 
bly due to unresolved trends and other noise sources. For 
the second night, the photometric quality was significantly 
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Figure 32. Upper left panel: the complete light curve of HAT-P-7 with all of the 16620 points, unbinned instrumental /-band photometry 
obtained with four telescopes of HATNet (see text for details), and folded with the period of P = 2.2047298 days (the result of a joint 
fit to all available data. Sec. I3.3."2t . The superimposed curve shows the best model fit using quadratic limb darkening. Right panel: The 
transit zoomed-in (3150 data points are shown). Lower left panel: same as the right panel, with the points binned with a bin size of 0.004 
in days. 
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Figure 33. Left panel: unbinned instrumental Sloan z-band partial transit photometry acquired by the KeplerCam at the FLWO 1.2 m 
telescope on 2007 November 2 and 2008 July 30; superimposed is the best-fit transit model light curve. Right panel: the difference 
between the KeplerCam observation and model (on the same vertical scale). 



worse, due to the high variations in the transparency^"'. The 
resuhing hght curves are shown in Fig. 1331 superimposed 
with our best fit model (Sec. 13.3]) . 

3.2.4 Excluding blend scenarios 

Following [Torres et all ^0^, we explored the possibility 
that the measured radial velocities are not real, but in- 
stead caused by distortions in the spectral line profiles due 



''^ For 2007 November 2, the scatter of the raw magnitudes were 
~ 14 mmag while on the night of 2008 July 30, the raw magnitude 
rms were more than 15 times higher, nearly 0.24 mag. 



to contamination from a nearby unresolved eclipsing bi- 
nary. In that case the "bisector span" of the average spec- 
tral line should vary periodically with amplitu de and phase 
simil a r to the measured velocit ies themselves (|Queloz et al.l 
I2OOII : iMandushev" et al. I l2005h . We cross-correlated each 
Keck spectrum against a synthetic template matching the 
properties of the star (i.e. based on the SME results, see 
Sec. I3.3.4|l . and averaged the correlation functions over all 
orders blueward of the region affected by the iodine lines. 
From this representation of the average spectral line profile 
we computed the mean bisectors, and as a measure of the 
line asymmetry we computed the "bisector spans" as the 
velocity difference between po ints selected near t he top and 
bottom of the mean bisectors (|Torres et al.ll2005[ ). If the ve- 
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locities were the result of a blend with an eclipsing binary, 
we would expect the line bisectors to vary in phase with 
the photometric period with an amplitude similar to that of 
the velocities. Instead, we detect no variation in excess of 
the measurement uncertainties (see Fig. 136b). We have also 
tested the significance of the correlation between the radial 
velocity and the bisector variations. Therefore, we conclude 
that the velocity variations are real and that the star is 
orbited by a Jovian planet. We note here that the mean bi- 
sector span ratio relative to the radial velocity amplitude is 
the smallest (~ 0.026) among all the HATNet planets, indi- 
cating an exceptionally high confidence that the RV signal 
is not due to a blend with an eclipsing binary companion. 

3.3 Analysis 

The analysis of the available data was done in four steps. 
First, an independent analysis was performed on the HAT- 
Net, the radial velocity (RV) and the high precision pho- 
tometric follow-up (FU) data, respectively. Analysis of the 
HATNet data yielded an initial value for the orbital period 
and transit epoch. The initial period and epoch were used 
to fold the RV's, and phase them with respect to the pre- 
dicted transit time for a circular orbit. The HATNet and the 
RV epochs together yield a more accurate period, since the 
time difference between the discovery light curve and the 
RV follow-up is fairly long; more than 3 years. Using this 
refined period, we can extrapolate to the expected center 
of the KeplerCam partial transit, and therefore obtain a fit 
for the two remaining key parameters describing the light 
curve: a/ J?* where a is the semi-major axis for a circular 
orbit, and the impact parameter h = (a/Ri,) cosz, where i is 
the inclination of the orbit. 

Second, using as starting points the initial values as de- 
rived above, we performed a joint fit of the HATNet, RV 
and FU data, i.e. fitting all of the parameters simultane- 
ously. The reason for such a joint fit is that the three sep- 
arate data-sets and the fitted parameters are intertwined. 
For example, the epoch (depending partly on the RV fit) 
has a relatively large error, afi'ecting the extrapolation of 
the transit center to the KeplerCam follow-up. 

In the discovery report, in all of the above procedures, 
we used the downhill simplex method (DHSX, Sec. I2.12.16p 
to search for the best fit values and the method of refitting 
to synthetic data sets (called EMCE, see also Sec. I2.12.16p 
to find out the error of the adjusted parameters. The re- 
fined analysis based on the HATNet light curves reduced by 
the method of image subtraction photometry and an addi- 
tional photometric measurement from the night of 2008 July 
30 was also involved. In this analysis the extended Markov 
Chain Monte-Carlo algorithm (XMMC) was employed, also 
in the form of an implementation found in the program If it. 
As it was mentioned in Sec. 12.12.161 the XMMC method 
used in this particular analysis has also been aided by the 
DHSX minimization (as a first iteration) and used as a san- 
ity check of the chain convergence (see also Sec. l3.3p . Both of 
these error estimation methods (EMCE and XMMC) yield 
a Monte-Carlo set of the a posteriori distribution of the fit 
parameters, that were subsequently used in the derivation 
of the final planetary, orbital and stellar characteristics. 

The third step of the analysis was the derivation of the 
stellar parameters, based on the spectroscopic analysis of the 



Table 8. Relative radial velocity (RV) and bisector span (BS) 
measurements of HAT-P-7. The RV and BS data points, as well 
as their formal errors are given in units of m/s. 



BJD 


RV 


0"RV 


BS 


TBS 


2454336.73121 






5.30 


5.36 


2454336.73958 


-1-124.40 


1.63 


0.68 


5.10 


2454336.85366 


+73.33 


1.48 


4.82 


6.17 


2454337.76211 


-223.89 


1.60 


-1.94 


5.30 


2454338.77439 


-1-166.71 


1.39 


2.58 


5.35 


2454338.85455 


-1-144.67 


1.42 


7.60 


5.22 


2454339.89886 


-241.02 


1.46 


-5.13 


5.77 


2454343.83180 


-145.42 


1.66 


-8.30 


6.58 


2454344.98804 


+101.05 


1.91 


-5.62 


5.80 



host star (high resolution spectroscopy using Keck/HIRES), 
and the physical modeling of the stellar evolution, based on 
existing isochrone models. As the fourth step, we then com- 
bined the results of the joint fit and stellar parameter deter- 
mination to determine the planetary and orbital parameters 
of the HAT-P-7b system. In the following we summarize 
these steps. 



3. 3. 1 Independent fits 

For the independent fit procedure, we first analyzed the 
HATNet light curves, as observed by the HAT-6, HAT- 7, 
HAT-8 and HAT-9 telescopes. Using the initial period and 
transit length from the BLS analysis, we fitted a model to 
the 214 cycles of observations spanned by all the HATNet 
data. Although at this stage we were interested only in the 
epoch and period, we have used the transit light curve model 
with the assumption of quadratic limb darkening, where the 
fiux decrease w a s calc ulated using the models provided by 
iMandel fc Agoll (|2002l ). In principle, fitting the epoch and 
period as two independent variables is equivalent to fit- 
ting the time instant of the centers of the first and last 
observed individual transits, Tc^Hrst and Tc^iast, with a con- 
straint that all intermediate transits are regularly spaced 
with period P. Note that this fit takes into account all tran- 
sits that occurred during the HATNet observations, even 
though it is described only by Tc, first and Tc.iast. The fit 
yielded Tcfirst = 2453153.0924 + 0.0021 (BJD) and Tcast = 
2453624.9044 + 0.0023 (BJD). the correlation between these 
two epochs turned out to be: C(Tc, first, Tc, last) = —0.53. 
The period derived from the Tc, first and Tcjast epochs was 
P'^' = 2.20480 ±0.00049 days. Using these values, we found 
that there were 326 cycles between Tcjast and the end of the 
RV campaign. The epoch extrapolated to the approximate 
time of RV measurements was T^nv = 2454343.646 ± 0.008 
(BJD). Note that the error in T^ny is much smaller than 
the period itself (~ 2.2 days), so there is no ambiguity in 
the number of elapsed cycles when folding the periodic sig- 
nal. 

We then analyzed the radial velocity data in the follow- 
ing way. We defined the A'tr = transit as that being closest 
to the end of the radial velocity measurements. This means 
that the first transit observed by HATNet (at Tc, first) was the 
A'^tr, first = —569 event. Given the sho rt period, we assumed 
that the orbit has been circularized (|Hutlll98lD (later ver- 
ified; see below). The orbital fit is linear if we choose the 
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Figure 35. Probability distributions and mutual correlations of 

the adjusted parameters T-sgg, T+123, K, Rp/Ri,, and C/R* 
for the planet HAT-P-7b. These are the only adjusted parameters 
of the analysis that are explicitly related to the physical properties 
of the planet and its orbit. The derivation of these distributions 
were performed exploiting the extended Markov Chain Monte- 
Carlo (XMMC) algorithm as it is implemented in the program 
If it (the related output is shown partially in Fig. [34j. See text 
for further details. 



radial velocity zero-point 7 and the amplitudes A and B as 
adjusted values, namely: 



v{t) = ^ + A cos 



2tv , 



+ Bsin 



-p(i-io) 



(95) 



where to is an arbitrary time instant (chosen to be to = 
2454342.6 BJD), K = A^ ^ is the semi-amplitude of 
the RV variations, and P is the initial period P''^-' taken from 
the previous independent HATNet fit. The actual epoch can 
be derived from the above equation, since for circular or- 
bits the transit center occurs when the RV curve has the 
most negative slope. For circular orbits, the transit occurs 
at the time instant when the RV curve has the smallest 
time-derivative, the actual epoch of the transit must be: 



P P f A 

Tc = to + — arg(-B, A) = to + — arc tan - — 
Ztv ztt V is 



(96) 



Using the equations above, we derived the initial epoch of 
the A^tr = transit center to be Tc = 2454343.6462 ± 



0.0042 



n(l) 



(BJD). We also performed a more general 
(non-Unear) fit to the RV in which we let the eccentricity 
float. This fit yielded an eccentricity consistent with zero, 
namely e cos = -0.003 ±0.007 and esintj = 0.000 ±0.010. 
Therefore, we adopt a circular orbit in the further analysis. 

Combining the RV epoch T^^^29 with the first epoch 
observed by HATNet (Tc, first), we obtained a somewhat re- 
fined period, P'^^ = 2.204732 ± 0.000016 days. This was 
fed back into phasing the RV data, and we performed the 
RV fit again to the parameters 7, A and B. The fit yielded 



7 

and T^^lig = 2454343.6470 ± 0.0042 (BJD). This epoch was 
used to further refine the period to get P^^^ = 2.204731 ± 
0.000016 d, where the error calculation assumes that Tc,-29 
and Tc.-seg are uncorrelated. At this point we stopped the 
above iterative procedure of refining the epoch and period; 
instead a final refinement of epoch and period was ob- 
tained through performing a joint fit, (as described later in 
Sec. l3.3.2|) . We note that in order to get a reduced chi-square 
value near unity for the radial velocity fit, it was necessary 
to quadratically increase the noise component with an am- 
plitude of 3.8 ms"'^, which is well within the range of s tellar 
jitter observed for late F stars; see iButler et al.1 (|2006l ). 

Using the improved period P'^' and the epoch Tc,-29, 
we extrapolated to the center of KeplerCam follow-up tran- 
sit (A'tr ~ 29). Since the follow-up observation only recorded 
a partial event (see Fig. I33p . this extrapolation was neces- 
sary to improve the light curve modeling. For this, we have 
used a quadrati c limb-darkening appr oximation, based on 
the formahsm of lMandel fc Agoj ^20021 ). The limb-darkening 
coefficients were based on the results of the SME analy- 
sis (notably, Tcs; see Sec. 13.3.41 for further details), which 
yielded 7^^' = 0.1329 and 72^'' = 0.3738. Using these values 
and the extrapolated time of the transit center, we adjusted 
the hght curve parameters: the relative radius of the planet 
p — Rp/Rt, the square of the impact parameter }P and 
the quantity C/R^, = ( a/P^)(27r/P)(l - b ^)~^/^ as indepen- 
dent parameters (see iBakos et al. I l2007d . for the choice of 
parameters). The resuh of the fit was p = 0.0762 ± 0.0012, 
= 0.205±0.144 and C/P* = 13.60±0.83 day-\ where the 
uncertainty of the transit center time due to the relatively 
high error in the transit epoch Tc,-29 was also taken into 
account in the error estimates. 



3.3.2 Joint fit based on the aperture photometry data and 
the single partial follow-up light curve 

The results of the individual fits described above provide 
the starting values for a joint fit, i.e. a simultaneous fit to 
all of the available HATNet, radial velocity and the partial 
follow-up hght curve data. The adjusted parameters were 
3c, -569, the time of first transit center in the HATNet cam- 
paign, m, the out-of-transit magnitude of the HATNet hght 
curve in /-band and the previously defined parameters of 7, 
A, B, p, and C,/R*. We note that in this joint fit all of 
the transits in the HATNet hght curve have been adjusted 
simultaneously, tied together by the constraint of assuming 
a strictly periodic signal; the shape of all these transits were 
characterized by p, b^ and C,/R-k (and the limb-darkening co- 
efficients) while the distinct transit center time instants were 
interpolated using Tc-seg = 7c, first and A, B via the RV fit. 
For initial values we used the results of the independent fits 
(Sec. I3.3TT|) . The error estimation based on method refitting 
to synthetic data sets gives the distribution of the adjusted 
values, and moreover, this distribution can be used directly 
as an input for a Monte-Carlo parameter determination for 
stellar evolution modeling, as described later (Sec. I3.3T4)) . 

Final results of the joint fit were: Tc,_569 = 
2453153.0924 ± 0.0015 (BJD), m = 9.85053 ± 0.00015 mag, 
7 = -37.0 ± 1.5ms"\ A = 33.8 ± 0.9ms"\ B = 210.7 ± 
1.9ms-\p = 0.0763±0.0010, b^ ^Q.n^toXtl and C/P* = 
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Figure 34. The output of the program If it showing the results of the extended Markov Chain Monte-Carlo (XMMC) analysis related 
to the HAT-P-7(b) planetary system. The parameters in the output are T—seg, T+123, K, Rp/Rt,, and C/R*^ respectively. For clarity, 
the other parameters were cut from the output list. 



13.34 ± 0.23 day~^. Using the distribution of these parame- 
ters, it is straightforward to obtain the values and the errors 
of the additional parameters derived from the joint derived 
fit, namely Tc,-29, a/R*, K and P. All final fit parameters 
are listed in Table flOl 



3.3.3 Joint fit based on the image subtraction photometry 
data and both of the follow-up light curves 

Involving the additional recent follow-up photometry data 
from 2008 July 30 and the HATNet light curve obtained 
by the method based on image subtraction, we repeated 
the analysis of the available data. In this new analysis, the 
method of extended Markov Chain Monte-Carlo (XMMC) 
has been employed to derive the best fit parameters and 
their a posteriori distributions. Due to the presence of a 
complete photometric follow-up light curve, we have used 
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Figure 36. (a) Radial-velocity measurements from Keck for 
HAT-P-7, along with an orbital fit, shown as a function of orbital 
phase, using our best fit as period (see Sec. 13.3.21 . The center- 
of-mass velocity has been subtracted, (b) Phased residuals after 
subtracting the orbital fit (also sec Sec. l3.3."2l . The rms variation 
of the residuals is about 3.8 ms"'^. (c) Bisector spans (BS) for the 
8 Keck spectra plus the single template spectrum, computed as 
described in the text. The mean value has been subtracted. Due 
to the relatively small errors comparing to the RV amplitude, the 
vertical scale on the (b) and (c) panels differ from the scale used 
on the top panel. 



a slightly different set of parameters. Moreover, the trend 
filtering based on the EPD algorithm has been performed 
simultaneously with the fit. Thus, the set of adjusted pa- 
rameters that are related to the physical properties of the 
planetary system were the following: the center of the first 
transit measured by the HATNet telescopes, Tc,_569; the 
transit center of the last follow-up photometry Tc,+i23, the 
radial velocity semi-amplitude K, the hght curve parame- 
ters Rp/Ri,, and (^/R*- Additionally, the out-of-transit 
magnitudes (both for the HATNet photometry and the two 
follow-up photometry), the zero-point of the radial velocity 
7, and the EPD coefficients for the two follow-up photom- 
etry were also included in the fit. The EPD was performed 
up to the first order against the profile sharpness parameters 
(S, D, K), the hour angle and the airmass. In the case of the 
HATNet photometry, we incorporated an additional param- 
eter, an instrumental blend factor whose inclusion was based 
on the experience that HATNet light curves tend to slightly 
underestimate the depth of the transits. To have a general 
purpose analysis, we extended the parameter set with the 
Lagrangian orbital elements k = ecoso; and h = esintj, but 
based our assumption for circular orbits, these were fixed to 
be zero in the case of HAT-P-7b. 

The XMMC analysis was performed in three ways. 
First, a full XMMC run was accomplished, involving all of 
the 23 parameters discussed below (6 physical parameters, 
3 out-of-transit magnitudes, the radial velocity zero-point, 
the 2x5 EPD coefficients, the instrumental blend factor 



and the fixed Lagrangian orbital elements). Second, we have 
separated the 2x5 linear EPD coefficients from the merit 
function and run the Markov chains while minimizing the 
accordingly in each step of the chain. Third, we derived 
the best fit parameters using the downhill simplex algorithm 
and during the XMMC run we kept the EPD coefficients to 
be fixed to their best fit values. All of these fits yielded a suc- 
cessful convergence and all of the sanity checks mentioned in 
Sec. l2.12.16l were adequate, namely a) the a posteriori distri- 
bution centers of the adjusted parameters (median values) 
agreed well with the downhill simplex best fit values, b) the 
chain acceptance ratio was in agreement with the theoreti- 
cal expectations, c) the correlation lengths for the parameter 
chains were sufficiently small, all of them were smaller than 
~ 2.6, and d) the covariance estimations from the Fisher 
information matrix agreed well, within a factor of ~ 1.2, 
with the statistical covariances derived from the a posteri- 
ori distributions. See also Fig. 1341 that shows the (slightly 
clarified and simplified) output of the If it program related 
to this particular analysis. In all of the cases, we have used 
a Gaussian a priori distribution for the transitions, where 
the covariance matrix of this Gaussian were derived from 
the Fisher matrix evaluated at the downhill simplex best fit 
value. In Fig. the distributions and some statistics for 
the 6 parameters related to the physical planetary (and or- 
bital) parameters are displayed. The plots in Fig. [35] clearly 
show how the proper selection of the adjusted parameters 
can help to reduce the mutual correlations. The only signifi- 
cant correlation is between C/R* rc,+i23. This correlation is 
resulted from the lack of a good quality complete follow-up 
photometry (due to its large scatter, the contribution of the 
second follow-up light curve is relatively smaller). 

For the final set of the parameters we accepted the dis- 
tribution that was derived using the third method mentioned 
above (i.e. when in the XMMC runs the 2x5 EPD param- 
eters were fixed to their best fit values). The derived best 
fit parameters that are related to physical quantities were 
the following: Tc-seg = 2453153.09286 ± 0.00105 (BJD), 
Tc,+i23 = 24 546 78.765 82 ± 0.00137 (BJD), K = 213.4 ± 
1.9ms-\ p = Rp/R^ = 0.7619 ±0.0009, = 0.206±0.103 
and (^/Rtc = 13.45 ± 0.22 day~^. Comparing to these val- 
ues with the ones presented in Sec. 13.3.21 the improvements 
in the parameter uncertainties are quite conspicuous. Espe- 
cially, the new, image subtraction based HATNet light curve 
has decreased the uncertainty in the first transit epoch of 
Tc,-569 with its significantly better quality. In the further 
analysis, we incorporated these distributions in order to de- 
rive the final stellar, planetary and orbital parameters. 



3.3.4 Stellar parameters 

The results of the joint fit enable us to refine the param- 
eters of the star. First, the iodine-free template spectrum 
from Keck was used for an initial determination of the 
atmospheric parameters. Spectral syn thesis modeling was 
carried out using the SME software (|Valenti fc Piskunovl 
Il996l ). wit h wavelength range s and atomic line data as de- 
scribed by IValenti fc Fischer! (|2005l 'l. We obtained the fol- 
lowing initial values: effective temperature 6350 ± 80 K, sur- 
face gravity log 5* — 4.06 ± 0.10 (cgs), iron abundance 
[Fe/H] = -fO.26 ± 0.08, and projected rotational veloc- 
ity wsini — 3.8 ± 0.5 km s^^. The rotational velocity is 
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Table 9. Stellar parameters for HAT-P-7. The values of effecitve 
temperature, metallicity and projected rotational velocity are 
based on purely spectroscopic data while the other ones are de- 
rived from the both the spectroscopy and the joint light curve 
and stellar evolution modelling. 



Parameter 



Value Source 



Toff (K) 


6350 ± 80 


SME 




[Fe/H] 


+0.26 ± 0.08 


SME 




V sin i (km s"-'-) 


3.8 ±0.5 


SME 






1 49+006 

^■^^-0.05 


Y2+LCH 


-SME 




1 92+01^ 


Y2+LCH 


-SME 


logs* (cgs) 


4 05+0 04 


Y2+LCH 


-SME 


L* (Lq) 


5 3+^-1 
^•■^-0.6 


Y2+LCH 


-SME 


Mv (mag) 


2.91 ±0.16 


Y2+LCH 


-SME 


Age (Gyr) 


2.1 ± 1.0 


Y2+LCH 


-SME 


Distance (pc) 


3201^0 


Y2+LCH 


-SME 



slightly smaller than the value given by the DS measure- 
ments. The temperature and surface gravity correspond to 
a slightly evolved F6 star. The uncertainties quoted here 
and in the remaining of this discussion are twice the sta- 
tistical uncertainties for the values given by the SME anal- 
ysis. This reflects our attempt, based on prior exp e rience , 
to inco rporate systematic error s (e.g. iNoves et al. I (|2008D : 
see also lValenti fc Fischeij (|2005l )). Note that the previously 
discussed limb darkening coefficients , 7^^'' , 79^^ , y^ ^-' and 73^^ 
have been taken from the tables of Iciaretl (|2004l ') by inter- 
polation to the above-mentioned SME values for Toff, log 5*, 
and [Fe/H]. 

As described bv lSozzetti et all (|2007l ) . a/ii* is a better 
luminosity indicator than the spectroscopic value of log (7, 
since the variation of stellar surface gravity has a subtle 
effect on the line profiles. Therefore, we used the values of 
Toff and [Fe/H] from the initial SME analysis, together with 
the distribution of a/7?* to estimate the stellar properties 
from com parison w i th the Yonsei-Yale (Y'^) stellar evolution 
models bv iYi et all (|200ll ). Since a Monte-Carlo set for a/R^ 
values has been derived during the joint fit, we performed 
the stellar parameter determination as follows. For a selected 
value of a/Ri, two Gaussian random values were drawn for 
Tefi and [Fe/H] with the mean and standard deviation as 
given by SME (with formal SME uncertainties doubled as 
indicated above). Using these three values, we searched the 
nearest isochrone and t he corresponding mass b y using the 
interpolator provided bv lDemarque et al.l (|2004l ). Repeating 
this procedure for values of a/Rt, T^s, [Fe/H], the set of 
the a posteriori distribution of the stellar parameters was 
obtained, including the mass, radius, age, luminosity and 
color (in multiple bands) . The age determined in this way is 
2.2 Gy with a statistical uncertainty of ±0.3 Gy; however, 
the uncertainty in the theoretical isochrone ages is about 
1.0 Gy. Since the corresponding value for the surface gravity 
of the star, logg* = 4.05lQ Qg (cgs), is well within 1-a of 
the value determined by the SME analysis, we accept the 
values from the joint fit as the final stellar parameters. These 
parameters are summarized in Table [5] 

We note that the Yonsei-Yale isochrones contain the ab- 
solute magnitudes and colors for different photometric bands 
from U up to M, providing an easy comparison of the esti- 
mated and the observed colors. Using these data, we deter- 
mined the V — I and J — K colors of the best fitted stellar 



model: {V-I)yy = 0.54±0.02 and {J-K)yy = 0. 27±0.02 
Since the co lors for the infrared ban ds provided bv iYi et al.l 
(|200ir ) and iDemargue et all (|2004l ') are given in the ESO 
photometric standard system, for the comparison with cat- 
alog data, we converted the infrared color (J — K)yy to 
the 2MA SS system (J — Ks) using the transformations 
given by ICarpentej (|200ll) . The color of the best fit stel- 
lar model was (J — Ks)yy ~ 0.25 ± 0.03, which is in fairly 
good agreement with the actual 2MASS color of HAT-P-7: 
{J-Ks) = 0.22±0.04. We have also compared the {V-I)yy 
color of the best fit model to the catalog data, and found that 
although HAT-P-7 has a low galactic latitude, bu = 13? 8, 
the model color agrees well wit h the observed TASS color of 
{V - /)tass = 0.60 ± 0.07 (see lDroege et al.ll2006h . Hence, 
the star is not affected by the interstellar reddening within 
the errors, since E{V — I) = {V — /)tass — {V — I)yy — 
0.06 ± 0.07. For estimating the distance of HAT-P-7, we 
used the absolute magnitude Mv ~ 2.91 ± 0.16 (result- 
ing from the isochrone analysis, see also Table [9)l and the 
Vtass = 10.51±0.06 observed magnitude. These two yield a 
distance modulus of Vtass — Mv = 7.51 ±0.28, i.e. distance 
of d = 320i^g pc. 

3. 3. 5 Planetary and orbital parameters 

The determination of the stellar properties was followed by 
the characterization of the planet itself. Since Monte-Carlo 
distributions were derived for both the light curve and the 
stellar parameters, the final planetary and orbital data were 
also obtained by the statistical analysis of the a posteri- 
ori distribution of the appropriate combination of these two 
Monte-Carlo data sets. We found that the mass of the planet 
is Mp = the radius is Rp = lA2lto ltr 

and its density is pp = 0.78 ± 0.16 gcm""^. We note that 
in the case of binary systems with large mass and radius 
ratios (such as the one here) there is a strong correlation 
between Mp and Rp (see e.g. iBeattv et al]|2007l ). This cor- 
relation is also exhibited here with C{Mp, Rp) = 0.81. The 
final planetary parameters are also summarized at the bot- 
tom of Table riOl 

Due to the way we derived the period, i.e. P = (rc,_29 — 
Tc,-569)/540, one can expect a large correlation between 
the epochs Tc,_29, Tc,-569 and the period itself. Indeed, 
C(Tc,-569,P) = -0.783 and C{n,-29,P) = 0.704, while 
the correlation between the two epochs is relatively small; 
C(Tc,_569, Tc,-29) = —0.111. It is easy to show that if the 
signs of the correlations between two epochs Ta and Tb (in 
our case To, -29 and Tc,_569) and the period are different, re- 
spectively, then there exists an optimal epoch E, which has 
the smallest error among all of the interpolated epochs. We 
note that E is such that it also exhibits the smallest correla- 
tion with the period. If (j{Ta) and o-(Tb) are the respective 
uncorrelated errors of the two epochs, then 



E = 



TAo{Tjif ± TbC7{Ta)^ 



^{TbY + (t{Ta 



(97) 



where square brackets denote the time of the transit event 
nearest to the time instance t. In the case of HAT-P-7b, 
Ta = Tc,-569 and Tb = Tc.-29, the corresponding epoch is 
the event A^tr = -280 &t E = T^-2s,o = 2,453,785.8503 ± 
0.0008 (BJD). The final ephemeris and planetary parameters 
are summarized in Table [TOl 
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Table 10. Orbital and planetary parameters for HAT-P-7. The 

parameters are derived from the joint modelling of the photomet- 
ric, radial velocity and spectroscopic data. 



Parameter 


Value 


P (days) 


2.2047298 ± 0.0000024 


E (BJD - 2,400,000) 


53, 785.8503 ± 0.0008 


ri4 (days)'' 


0.1625 ±0.0029 


Ti2 = T34 (days)'' 


0.0141 ± 0.0020 



Rp/R* 

b = a cos i I Ri, 
i (deg) 

Transit duration (days) 

(71.72) 

K (ms-i) 
7 (kms^-"-) 



^■^■^-0.28 
0.0761 ±0.0009 
1+0.10 



0.44^ 



0.15 



4?1^ 



1+2.2 
-2.0 

0.1461 ±0.0016 
(0.1195,0.3595) 



213.2 ± 1.9 
-37.0 ± 1.5 
(adopted) 



Mp (Mj) 

Rp (Rj) 

C{Mp,Rp) 
Pp (gcm-3) 
a (AU) 
logflp (cgs) 

Toq (K) 



l.SOO_o 

1 421+0144 
^•^^^-0.097 
0.81 

0.78 ±0.16 
0.0379 ± 0.0004 
3.34 ± 0.07 
2175^ 



f85 
-60 



^ T14: total transit duration, time between first to last contact; 
T12 = T34: ingress/egress time, time between first and second, or 
third and fourth contact. 

4 FOLLOW-UP OBSERVATIONS 

Now we shift our attention to another system, one of the 
eccentric transiting planetary systems of HAT-P-2b. At 
the time of its discovery, HAT-P-2b was the longest pe- 
riod and most massive transiting extrasolar pl anet (TEP), 
and t he only one on an eccentric orbit (jBakos et al.l 
l2007bl ). In the following, other TEPs have also been discov- 
ered with significant o r bital eccentricity, an d long period: 
GJ 4 36b (iGillon et all |2007^, HP 171 5 6b dBarbieri et al] 
I2OO7I ) and XO-3b (|johns-Krull et all I2OO8I ). (See, e.g. 
http : // exoplanet . eu for an up-to-date database for tran- 
siting extrasolar planets.) 

Planet HAT-P-2b was detected as a tran siting object 



during the campaign of the HATNet telescope s (iBakos et al 



20021 . |2004| ). and Wise HAT telescope (WHAT lShporer et al 



20061 '). The HATNet telescopes and the WHAT telescope 
gathered ~ 26, 000 individual photometric measurements. 
The planetary transit was followed up by the FLWO 1.2m 
telescope, utilizing the KeplerCam detector. The planetary 
properties have been confirmed by radial velocity measure- 
ments and bisector analysis of the spectral line profiles. The 
latter has shown no bisector variations, excluding the possi- 
bilities of a hierarchical triplet or a blended eclipsing binary. 

Recently, the spin-orbit ali gnment of t he HA T-P- 
2(b) system was m easured by I Winn et al.l (|2007l ) and 
iLoeillet et al.l (|2008l ). Both studies reported an alignment 
consistent with zero within an uncertainty of ~ 10°. These 
results are exceptionally interesting since short period plan- 
ets are thought to be formed at much larger distances 
from their parent star and migrated inward while the or- 
bital eccentricitj;Js_d^m26d_^^ an almost circular or- 
bit (jPAngelo. Lubow fc Batgl2006h . Physical mechanisms 
such as Kozai interaction between the transiting planet 
and an unknown massive companion on an inclined orbit 



could result tight eccentric orbi ts (jFabrvckv fc Tremaind 
I2OO7I : iTakeda. Kita fc RasiolbOOSi '). However, in such a sce- 
nario, the spin-orbit alignment can be expected to be sig- 
nificantly larg er than the measur ed. For instance, in the 
case of XO-3b (|Hebrard et al.ll2008l ). the reported alignment 
is A = 70° ± 15°. In multiple planetary systems, planet- 
planet scattering can also yield eccentric orbits (see e.g. 
iFord fc Holmanli2007l ). 

The physical properties of the host star HAT-P-2 have 
been controversial since different methods for stellar char- 
acterization resulted stellar radii between ~ 1.4 7?0 and 
~ 1.8 7?0. Moreover, the actual distance of the system also 
had large systematic errors, since the reported Hipparcos 
distance seemed to be significantly larger than what could 
be expected from the absolute luminosity (coming from the 
stellar evolution modelling). 

In this chapter new photometric and spectroscopic ob- 
servations of the planetary system HAT-P-2(b) are pre- 
sented, and I demonstrate how the photometry package can 
be used in the case of a follow-up observation. The new 
photometric measurements significantly improve the light 
curve parameters, therefore some of the stellar parameters 
are more accurately constrained. In addition, radial velocity 
measurements based on spectroscopic observations have re- 
sulted significantly smaller uncertainties, which, due to the 
orbital eccentricity, also affect the results of the stellar evolu- 
tion modelling. In Sec. 14.11 we summarize our photometric 
observations of this system, while in Sec. 14.21 we describe 
briefly the issues related to the radial velocity data points. 
The details of a new formalism used in the characterization 
of the radial velocities is discussed in Sec. 14.31 and the steps 
of the complete analysis are described in Sec. 14.41 We sum- 
marize our results in Sec. 14.51 

4.1 Photometric observations and reductions 

In the present analysis we utilize photo metric data obtained 
by the HATNet telescopes (published in lBakos et"aLll2007bl ) 
and by the KeplerCam detector mounted on the FLWO 1.2m 
telescope. T he photomet r y of H ATNet have already been 
presented in iBakos et al.l (|2007bh . These HATNet data are 
plotted on Fig. 1371 superimposed with our new best-flt 
model (see Sec. 14.41 for details on light curve modelling) . 
We observed the planetary transit six times, on 2007 March 
18, 2007 April 21, 2007 May 08, 2007 June 22, 2008 March 
24 and 2008 May 25, yielding 4 nearly complete and 2 par- 
tial transit light curves. One of these follow-up light curves 
(2007 April 21) has already been published in the discovery 
paper. All of our high precision follow-up photometry data 
are plotted on Fig. 1381 along with our best-fit transit light 
curve model (see also Sec. 14. 4p . 

The frames taken by the KeplerCam detector have been 
calibrated and reduced in the following similar fashion for all 
of the observations for the six nights. Prior to the real cal- 
ibration, all pixels which are saturated (or blooming) have 
been marked (fiign, see Sec. I2.12.6|l . forcing them to be 
omitted from the upcoming photometry. During the cali- 
bration of the frames we have used standard bias, dark and 
sky-fiat corrections. 

Following the calibration, the detection of stars and 
the derivation of the astrometrical solution was done in 
two steps. First, an initial astrometrical transformation was 
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Figure 37. The folded HA TNet light curve of HAT-P-2 (pub- 
lished in lBakos et al.ll2007d) . showing the points only nearby the 
transit. The upper panel is superimposed with our best-fit model 
and the lower panel shows the fit residual. Sec text for further 
details. 



derived using the ~ 50 brightest and non-saturated stars 
(whose parameters were derived by the program fistsir, 
see Sec. 12.12.811 from each frame, and by using the 2MASS 
catalogue ( Skrutski3 |2006| ) as a reference. The transfor- 
mation itself has been obtained by the program grmatch 
(Sec. l2.12.10|l . with a second-order polynomial fit. Since the 
astrometrical data found in the 2MASS catalogue was ob- 
tained by the same kind of telescope, one could expect sig- 
nificantly better astrometrical data from the FLWO 1.2 m 
telescope due to the numerous individual frames taken at 
better spatial resolution. Indeed, an internal catalog which 
was derived from the detected stellar centroids by register- 
ing them to the same reference system has shown an inter- 
nal precision ~ 0.005 arc sec for the brighter stars while the 
2MASS catalog reports an uncertainty that is larger by an 
order of magnitude: nearly ~ 0.06 arc sec. Therefore, in the 
second step of the astrometry, we used this new catalog to 
derive the individual astrometrical solutions for each frame, 
still using a second-order polynomial fit. We note here that 
this method also corrects for the systematic errors in the 
photometry yielded by the proper motion of the stars. 

Using the above astrometrical solutions, we per- 
formed aperture photometry (with the program fiphot. 
Sec. I2.12.13p on fix centroids, employing a set of five aper- 
tures between 7.5 and 17.5 pixels in radius. The results of 
the aperture photometry were then transformed to the same 
instrumental magnitude system using a correction to the 
spatial variations and the difi'erential extinction (the for- 
mer depends on the celestial coordinates while the latter 
depends on the intrinsic colors of the stars). Both correc- 
tions were linear in the pixel coordinates and linear in the 
colors. Experience shows that significant correlations can oc- 
cur between the instrumental magnitudes and some of the 
external parameters of the light curves (such as the FWHM 
of the stars, subpixel positions). Although one should de- 
trend against these correlations using purely out-of-transit 
data (both before ingress and after egress), we have carried 
out such an external parameter decorrelation (EPD) simul- 
taneously with the light curve modelling (Sec. I4.4|l due to 
the lack of out-of-transit data in several cases. After the si- 
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Figure 38. Follow-up light curves of HAT-P-2. The light curves 
were acquired on 2007 March 18, 2007 April 21, 2007 May 08, 2007 
June 22, 2008 March 24 and 2008 May 25, while the respective 
transit sequence numbers were A'^tr = —6, 0, -1-3, +11, -1-60 and 
-f 71. All of these light curves are superimposed with our best-fit 
model. See text for further details. 



multaneous light curve modelling and de-trending, we chose 
the aperture for each night that yielded the smallest resid- 
ual. In all of the cases this "best aperture" was neither the 
smallest nor the largest one from the set, confirming our as- 
sumptions for selecting a good aperture series. We note here 
that since all of the stars on the frames were well isolated, 
such choice of different radii of the apertures does not re- 
sult in any systematics, because stars are not blended by 
any of these apertures. In addition, due to the high fiux of 
HAT-P-2 and the comparison stars, the frames were slightly 
extrafocal (in order to avoid saturation). This resulted differ- 
ent FWHM per night for the stars and therefore the optimal 
apertures yielding the highest signal-to-noise ratio also have 
different radii for each night. 
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4.2 Radial velocity observations 

In the discovery paper of HAT-P-2b (|Bakos et alj l2007tj ). 
13 individual radial velocity measuremen ts were reported 
that were utiUzing the HIRES instrument (|VoErt et al.lll994l ') 
on the Keck I telescope, on Mauna Kea, Hawaii, plus 10 
measurements from t he Hamilto n echelle spectrograph at 
the Lick Observatory (|Voe:tlll987[ ). In the last year, we have 
acquired 14 additional radial velocity measurements using 
the HIRES instrument on Keck. In the analysis, we have 
also used the online radial velocity data obtained by the 
OHP/SOPHIE spectrograph at out-of-transit (i.e. omitting 
the meas urements for the Ros siter-McLaughlin effect), pub- 
lished bv lLoeillet et al. I (|2008t ). With these additional 8 ob- 
servations, we have 27 -I- 10 -f 8 = 45 high precision RV data 
points at hand for a refined analysis. 

In Table [TT] we collected all (previously published and 
our newly obtained) radial velocity measurements. In Fig. 1401 
we show the RV data, overplotted with our best-fit model 
solution (for details of the fit, see Sec. I4.4|l . 



4.3 An analytical formalism for Kepler's problem 

In this section we present a set of analytic relations (based on 
a few smooth functions defined in a closed form) that pro- 
vides a straightforward solution of Kepler's problem, and 
consequently, time series of RV data and RV model func- 
tions. Due to the analytic property, the partial derivatives 
can also be obtained directly and therefore can be utilized 
in various fitting and data analysis methods, including the 
Fisher analysis of uncertainties and correlations. The func- 
tions presented here are nearly as simple to manage as 
trigonometric functions. This section has three major parts. 
In Sec. 14.3.11 the basics of the mathematical formalism are 
presented, including the rules for calculating partial deriva- 
tives. In Sec. 14.3.21 the solution of the spatial problem is 
shown, supplemented with the inverse problem, still using 
infinitely differentiable functions. This part also discusses 
how transits constrain the phase of the radial velocity curve. 
And finally, in Sec. 14.3.31 we show how the presented formal- 
ism can be implemented in practice, in the framework of the 
If it program and involving some of the built-in functions. 

4-3.1 Mathematical formalism 

The solution for the time evolution of Kepler's problem can 
be derived in th e standard way as given in various text- 
books (see, e.g., iMurrav fc DermottI Il999h . The restricted 
two body problem itself is an integrable ordinary differen- 
tial equation. In the planar case, three independent inte- 
grals of motion exist and one variable with uniform mono- 
tonicity (i.e. which is an affine function of time). The inte- 
grals are related to the well known orbital elements, that are 
used to characterize the orbit. These are the semimajor axis 
a, the eccentricity e and the longitude of pericenter^'' zu. 
The fourth quantity is the mean anomaly M — nt, where 
n = ^J'JIJcfi = 2-KjP, the mean motion, which is zero at 



^ In two dimensions, the argument of pcricenter is always equal 
to the longitude of pericenter, i.e. to = a; 



Table 1 1 . Comprehensive list of relative radial velocity measure- 
ments for HAT-P-2. The Keck measurements mar ked with an 
asterix and the Lick measurements are published in[Bakos_^^^l. 
I 2007bl) . The OHP/SOPHIE data are taken from iLoeillet et al. 
1 20081) . 



BJD - 2M4 


RV (m/s) 


o"Rv(m/s) 


Source 


53981.77748 


12.0 


7.3 


Keck* 


53982.87168 


-288.3 


7.9 


Keck* 


53983.81485 


569.0 


7.3 


Keck* 


54023.69150 


727.3 


7.8 


Keck* 


54186.99824 


721.3 


7.7 


Keck* 


54187.10415 


711.0 


6.7 


Keck* 


54187.15987 


738.1 


6.8 


Keck* 


54188.01687 


783.6 


7.1 


Keck* 


54188.15961 


801.8 


6.7 


Keck* 


54189.01037 


671.0 


6.7 


Keck* 


54189.08890 


656.7 


6.8 


Keck* 


54189.15771 


640.2 


6.9 


Keck* 


54216.95938 


747.7 


8.1 


Keck 


54279.87688 


402.0 


8.3 


Keck 


54285.82384 


168.3 


5.7 


Keck 


54294.87869 


756.8 


6.5 


Keck 


54304.86497 


615.5 


6.2 


Keck 


54305.87010 


764.2 


6.3 


Keck 


54306.86520 


761.4 


7.6 


Keck 


54307.91236 


479.1 


6.5 


Keck 


54335.81260 


574.7 


6.8 


Keck 


54546.09817 


-670.9 


10.1 


Keck 


54547.11569 


554.6 


7.4 


Keck 


54549.05046 


784.8 


9.2 


Keck 


54602.91654 


296.3 


7.0 


Keck 


54603.93210 


688.0 


5.9 


Keck 


54168.96790 


-152.7 


42.1 


Lick'' 


54169.95190 


542.4 


41.3 


Lick'' 


54170.86190 


556.8 


42.6 


Lick'' 


54171.03650 


719.1 


49.6 


Lick" 


54218.80810 


-1165.2 


88.3 


Lick" 


54218.98560 


-1492.6 


90.8 


Lick" 


54219.93730 


-28.2 


43.9 


Lick" 


54219.96000 


-14.8 


43.9 


Lick" 


54220.96410 


451.6 


38.4 


Lick" 


54220.99340 


590.7 


37.1 


Lick" 


54227.50160 


-19401.4 


8.8 


OHP'' 


54227.60000 


-19408.2 


6.5 


OHP'' 


54228.58420 


-19558.1 


18.8 


OHP'' 


54229.59930 


-20187.4 


16.1 


OHP'^ 


54230.44750 


-21224.9 


14.1 


OHP*^ 


54230.60290 


-20853.6 


14.8 


OHP*^ 


54231.59870 


-19531.1 


12.1 


OHP'^ 


54236.51900 


-20220.7 


5.6 


OHP'^ 



pericenter passage""^. The solution to Kepler's problem can 
be given in terms of the mean anomaly M as defined as 

E-esmE = M, (98) 

where E is the eccentric anomaly. The spatial coordinates 
are 

^ = 1^0 cos 117 — jyo sin 1x7, (99) 

r] = ,^0 sin ci7 + 7^0 cos 1x7, (100) 

The mass parameter of Kepler's problem is denoted by = 
Q{m\ +012), where mi and m2 are the masses of the two orbiting 
bodies and Q is the Newtonian gravitational constant. 
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where 

^0 = a(cos E — e), 
rjo = ay/ 1 — sin E; 



(101) 
(102) 



see also iMurrav fc Dermottl (|l999( ). Sect. 2.4 for the deriva- 
tion of these equations. Since for circular orbits the longi- 
tude of pericenter and pericenter passage cannot be defined, 
and for nearly circular orbits, these can only be badly con- 
strained; in these cases it is useful to define a new variable, 
the mean longitude as A = M-l-ro to use instead of M. Since 
zu is an integral of the motion, X = M — n. Therefore for 
circular orbits vj = and equations (|10ip - (|102p should be 
replaced by 



^0 = a cos A, 
rjo = a sin A. 



(103) 
(104) 



To obtain an analytical solution to the problem, i.e. which 
is infinitely difi'erentiable with respect to all of the orbital 
elements and the mean longitude, first let us define the 
Lagrangian orbital elements k — ecosro and h = esinro. 
Substituting equations (|101|) - (|102|) into equations (f99)) - (|100p 
gives 



c \ e sin E I +h 

s I ^ 2-e \ -k 



(105) 



where c = cos(A -I- esinS), s = sin(A -I- esiniJ) and £ = 
1 — Vl — e^, the oblateness of the orbit. The derivation of 
the above equation is straightforward, one should only keep 
in mind that E + zu — X + e sin E. In the first part of this 
section we prove that the quantities 



p{X,k,h) = 
and 

q{X,k,h) = 



if fc = and /i = 

e sin E otherwise 



if fc = and h = 

e cos E otherwise 



(106) 



(107) 



are analytic ~ infinitely differentiable - functions of A, k and 
h for all real values of A and for all k^ + — < 1. 
In the following parts, we utilize the partial derivatives of 
these analytic functions to obtain the orbital velocities, and 
we also derive some other useful relations. In this section we 
only deal with planar orbits, the three dimensional case is 
discussed in the next section. 

4.3.1.1 Partial derivatives and the analytic prop- 
erty A real function is analytic when all of its par- 
tial derivatives exist, the partial derivatives are continu- 
ous functions a nd only dep end on other analytic functions. 
It is proven in iPall (|2009l ) that the partial derivatives of 
q = q(A, k, h) and p = p(A, k, h) are the following for 
(fc,/i)/(0,0): 



dq 
dX 
dq 
dk 
dq 
dh 



-P 



k _ cos(A + p) — k 

q 1-q 

h sin(A -\-p) — h 



1-g 



(108) 
(109) 
(110) 



and 

dp 

dX 

dp 

dk 

dp 

dh 



1-g' 
+s 



-|-sin(A -l-p) 
— cos(A + p) 



(HI) 
(112) 
(113) 



Since for all A; + < 1, q < 1 and therefore 1 — q > 0, 
all of the above functions are continuous on their domains. 
Since the sin(-) and cos(-) functions are analytic, therefore 
one can conclude that the functions q(-, •, •) and p(-, •, •) are 
also analytic. 

Substituting the definition of p = p(A, k, h) into equa- 
tion (jlOSp . one can write 



cos(A + p)\ p I +h 
sm{X+p)) 2-£\-k 



(114) 



while the radial distance of the orbiting body from the center 
is \/ ^'^ + rj'^ = r = a(l — q). For small eccentricities in equa- 



tion (|114l) the third term (fc, h) is negligible compared to the 
first term (cos, sin) while the second term {h, — A;)p/(2 — f) is 
negligible compared to the third term. Therefore for e <C 1, 
p is proportional to the phase offset in the polar angle of 
the orbiting particle (as defined from the geometric center 
of the orbit) and q is proportional to the distance offset rel- 
ative to a circular orbit; both caused by the non-zero orbital 
eccentricity. 

Since equation (|114|l is a combination of purely analytic 
functions, the solution of Kepler's problem is analytic with 
respect to the orbital elements a, {k,h), and to the mean 
longitude A in the domain a > and k^ + < 1. We note 
here that this formalism omits the parabolic or hyperbolic 
solu tions. The formalism ba sed on the Stumpff functions 
(see IStiefel &: Scheifel^ Il97lh provides a continuous set of 
formulae for the elliptic, parabolic, and hyperbolic orbits 
but this parametrization is still singular in the e ^ limit. 

4.3.1.2 Orbital velocities Assuming a non-perturbed 
orbit, i.e. when {k,h) = 0, and a = and when the mean 
motion n = A is constant, the orbital velocities can be di- 
rectly obtained by calculating the partial derivative of equa- 
tion (|114p with respect to A and applying the chain rule since 



d_ U 

dt U 



d_ U 

dX U 



dX_d_ 
dt ^ "^dX \ 77 



(115) 



Substituting the partial derivative equation (|lll|l into the 
expansion of d^/dX and drj/dX one gets 



1-g 



— sin(A + p) 
+ cos(A +p) 



+ 



2-1 



(116) 



Note that equation pi6l) is also a combination of purely 
analytic functions, the components of the orbital velocity 
are analytic with respect to the orbital elements a, {k,h), 
and to the mean longitude A. 

It is also evident that the time derivative of equa- 
tion (fnel) is 



(l-g)3 



cos(A -I- p) 
sin(A + p) 



+ 



(117) 
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+h 

-k 



Obviously, equation (|117|l can be written as 



(1 - g)» \ti 



(118) 



which is equivalent to the equations of motion since ^ 
n^a? and vs+'?^ = r = a(l — g). 



to the radial velocity analysis, however, the macro definition 
capabilities of the program can be involved in order to de- 
fine some more useful functions which then can be directly 
applied in real problems. The shell script pieces shown in 
Fig. [39] demonstrate how equations (|119|l and (|121|) are im- 
plemented in practice. The parametric derivatives of these 
functions, such as equation (|122|l or (|123|) are then derived 
automatically by If it, using the partial derivatives of the 
base functions p(A, k, h) and q(A, k, h) as well as the chain 
rule. 



4-3.2 Additional constraints given by the transits 

In the follow-up observations of planets discovered by tran- 
sits in photometric data series, the detection of variations 
in the RV signal is one of the most relevant steps, either to 
rule out transits of late-type dwarf stars, and/or blends, or 
to characterize the mass of the planet and the orbital param- 
eters. Since transit timing constrains the epoch and orbital 
period much more precisely than radial velocity alone, these 
two can be assumed to be fixed in the analysis of the RV 
data. However, this constraint also includes an additional 
feature. The mean longitude has to be shifted to the tran- 
sits since it is 7r/2 only for circular orbits at the time of the 
transit. It can be shown that the mean longitude at the time 
instance of the transit is 



Atr = arg k + 



kh 
2-V 



h- 



2-1 



k{l-t} 



(119) 



therefore the mean longitude at the orbital phase tp becomes 
A = Atr + 2nip. Thus, the observed radial velocity signal 
is proportional to the 77 component of the velocity vector, 
namely 



RV = 7 -)- Kqv, 

V — r){\tr + 2-KLp,k,h), 



(120) 
(121) 



where 7 is the mean barycentric velocity and Kq is related 
to the semi-amplitude K as Ko = K\/l — e^. Consequently, 
the partial derivatives of the v = rj RV component, v — 
Vi^ti + 2ii(p, k, h) with respect to the orbital elements k and 
h are 

dv ^ drj_ drj_d\t^ 

dk dk dX dk ' ^ ' 

dv ^ 9^ 9?)9Atr , . 

dh dh dX dh ' ^ ' 

A radial velocity curve of a star, caused by the pertur- 
bation of a single companion can be parametrized by six 
quantities: the semi-amplitude of RV variations, K, the zero 
point, G, the Lagrangian orbital elements, {k, h), the epoch, 
To (or equivalently the phase at an arbitrary fixed time in- 
stant) and the period P. In the cases of transiting plan- 
ets, the later two are known since the photometric observa- 
tions of the transits constrain both quantities with exceeding 
precision (relative to the precision attainable purely by the 
RV data). Therefore, one has to fit only four quantities, i.e. 
sl^ {K,G,k,h). 

4-3.3 Practical implementation 

The eccentric offset functions p(A, k, h) and q(A, k, h) axe 
implemented in the program If it (see also Sec. I2.12.16|) . 
This program does not provide further functionality related 



4.4 Analysis of the HAT-P-2 planetary system 

In this section we briefly describe the analysis of the avail- 
able photometric and radial velocity data of HAT-P-2 in 
order to determine the planetary parameters as accurately 
as possible. The modelling was done in three major steps in 
an iterative way. The first step was the modelling of the light 
curve and the radial velocity data series. Second, this was 
followed by the determination of the stellar parameters. In 
the last step, by combining the light curve parameters with 
the stellar properties, we obtained the physical parameters 
(mass, radius) of the planet. 

To model transit light curves taken in optical or near- 
infrared photometric passbands, we include the effect of 
the stellar limb darkening. We have used the formalism of 
iMandel fc Agol (|2002D to model the flux decrease during 
transits under the assumption of quadratic limb darken- 
ing law. Since the limb darkening coefflcients are the func- 
tion of the stellar atmospheric parameters (such as effec- 
tive temperature Tcff, surface gravity logg* and metallic- 
ity), the whole light curve analysis should be preceded by 
the initial derivation of these parameters. These parame- 
ters were obtained by collaborators, using the iodine-free 
template spectrum obtained by the HIRES instrument on 
Keck I and e mploying the Spectroscop y Made Easy soft- 
ware package (jValenti fc Piskunovlll996l). suppo rted by the 



Valenti fc Fisched |200i). This anal- 



atomic line database of 
ysis yields the Tcfi, logg*, [Fe/H] and the projected rota- 
tional velocity vsini. The result of the SME analysis when 
all of these values have been adjusted simultaneously were 
logg, = 4.22 ± 0.14 (CGS), = 6290 ± UOK, [Fe/H] = 
0.12 ±0.08 and vsini = 20.8 ± 0.2 kms"\ 

The limb darkening coefficients are then derived for z' 
and / photo metric bands b y int erpolati o n, usi ng the tables 
provided bv IClaretl (|2000t ) and IClaretl (|2004l '). The initial 
values for the coefficients were 7^' = 0.1430, 7^""' = 0.3615, 



7{ ' = 0.1765, and 72^' = 0.3688. After the first iteration, 
with the knowledge of the stellar parameters, the SME anal- 
ysis is repeated by fixing the surface gravity to the value 
yielded by the stellar evolution modelling. This can be done 
in a straightforward way: the normalized semimajor axis 
a/7?, can be obtained from the transit light curve model pa- 
rameters, the orbital eccentricity and the argument of peri- 
center. As it was pointed out bv ISozzetti et all |20o3), the 
ratio a/Ri, is a more effective luminosity indicator than the 
stellar surface gravity, since the stellar density is related to 



oc {a/R^,f 



(124) 



Since HAT-P-2b is a quite massive planet, i.e. Mp/Af, ~ 
0.01, relation (|124|) requires a significant correction, which 
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If it 


-X 


"eoc(l,k,h)=cos(l+eop(l,k,h)) " \ 




-X 


"eos(l,k,h)=sin(l+eop(l,k,h))" \ 




-X 


"J(k,h)=sqrt(l-k*k-h*h)" \ 




-X 


" lamtranxy (x , y , k , h) =arg (k+x+h* (k*y-h*x) /(1+J(k,h)), h+y-k* (k*y-h*x) / ( 1+ J (k , h) ) ) - 
(k*y-h*x)*J(k,h)/(l+k*x+h*y) " \ 




-X 


"lajiitraii(10,k,h)=lamtranxy(cos(10) ,siii(10) ,k,h) " \ 




-X 


"prxO(l,k,h)=(+eoc(l,k,h)+h*eop(l,k,h)/(l+J(k,h))-k)" \ 




-X 


"pryO(l,k,h)=(+eos(l,k,h)-k*eop(l,k,h)/(l+J(k,h))-h)" \ 




-X 


"rvxO(l,k,h)=(-eos(l,k,h)+h*eoq(l,k,h)/(l+J(k,h)))/(l-eoq(l,k,h))" \ 




-X 


"rvyO(l,k,h)=(+eoc(l,k,h)-k*eoq(l,k,h)/(l+J(k,h)))/(l-eoq(l,k,h)) " \ 




-X 


"prxl(l,k,h)=prxO(l+laintraiixy(0,l,k,h) ,k,h)" \ 




-X 


"pryl(l,k,h)=pryO(l+lamtraiixy(0,l,k,h) ,k,h)" \ 




-X 


"rvxl(l,k,h)=rvxO(l+lamtraiixy(0,l,k,h) ,k,h)" \ 




-X 


"rvyl(l,k,h)=rvyO(l+lamtraiixy(0,l,k,h) ,k,h)" \ 




-X 


"rvbase(l,k,h)=rvyl(l,k,h)" \ 



Figure 39. Macro definitions for If it, implementing some functions related to radial velocity analysis. All of the above functions are 
based on the eccentric offset functions eop( . , . , . ) and eoq( .,.,.) as defined by equations I IIO6I I and II107II . 
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Figure 40. Radial velocity measurements for HAT-P-2 folded 
with the best-fit orbital period. Filled dots represent the 
OHP/SOPHIE data, open circles show the Lick/Hamilton, while 
the open boxes mark the Keck/HIRES observations. In the up- 
per panel, all of these three RV data sets are shifted to zero 
mean barycentric velocity. The RV data are superimposed with 
our best-fit model. The lower panel shows the residuals from the 
best-fit. Note the different vertical scales on the two panels. The 
transit occurs at zero orbital phase. See text for further details. 

also depends on observable quantities (see lPal et al.|[2008bl . 
for more details). In our case, this correction is not neg- 
ligible since Mp/M* is comparable to the typical relative 
uncertainties in the hght curve parameters. 

4.4.1 Light curve and radial velocity parameters 

The first major step of the analysis is the determination of 
the light curve and radial velocity parameters. We performed 



a joint fit by adjusting the hght curve and radial velocity 
parameters simultaneously as described below. 

The parameters can be classified into three major 
groups. The hght curve parameters that are related to the 
physical properties of the planetary system are the tran- 
sit epoch E, the period P, the fractional planetary radius 
p = Rp/Ri,, the impact parameter b, and the normalized 
semimajor axis a/Rt. The physical radial velocity parame- 
ters are the RV semi-amplitude K, the orbital eccentricity e 
and the argument of pericenter uj. In the third group there 
are parameters that are not related to the physical proper- 
ties of the system, but are rather instrumentation specific 
ones. These are the out-of-transit instrumental magnitudes 
of the follow-up (and HATNet) hght curves, and the RV 
zero-points 7Kcck, 7Lick and 70HP of the three individual 
data sets^^. 

To minimize the correlation between the adjusted pa- 
rameters, we use a slightly different parameter set. Instead 
of adjusting the epoch and period, we fitted the first and 
last available transit center time, T-i4s and T+71. Here in- 
dices note the transit event number: the A'tr = event was 
defined as the first complete follow-up light curve taken on 
2007 April 21, the first available transit observation from the 
HATNet data was the event TVtr = —148 and the last follow- 
up was observed on 2008 May 25, was event A^tr = +71. 
Note that assuming equidistant transit cadences, all of the 
transit centers available in the HATNet and follow-up pho- 
tometry are constr a ined by these tw o transit instances (see 
iBakos et al1l2007d : |pi.l et all 1200881 ). Similarly, instead of 
the eccentricity e and argument of pericenter oj, we have 
adjusted the Lagrangian orbital elements k = e cos ui and 
h = esinw. These elements show no correlation in practice, 
moreover, the radial velocity curve is an analytic function of 
these even for e ^ cases (although in the case of HAT-P- 

''^ Since in the reduction of the lLoeillet et ah ] 1I2OO8I') data a syn- 
thetic stellar spectrum was used as a reference, 70HP is the phys- 
ical barycentric radial velocity of the system. In the reductions of 
the Keck and Lick data, we used one of the spectra as a template, 
therefore the zero-points of these two are arbitrary, lack any real 
physical interpretation. 
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2b this is irrel evant because e is non-zero). As it is known in 
the literature (|Winn et al.ll2007bl : [piill2008l '). the impact pa- 
rameter b and a/Ri, are also strongly correlated, especially 
for small p = R^/Rj, v alues. Therefore, as it was suggested 
bv lBakos et al.l (|2007cl ). we chose the parameters C/R* S'l^d 
for fitting instead of a/7?* and 6, where for eccentric orbits 
C/R* is related to a/Ri, as 



R. 



2-K 



^/^ 



P l + h 



(125) 



The quantity C,/R* is related to the transit duration as 
Tdur = ^iC/R*)^^, if the duration is defined between the 
time instants when the center of the planet crosses the limb 
of the star inwards and outwards. 



4-4-<^ Effects of the orbital eccentricity 

Let us denote the projected radial distance between the 
center of the planet and the center of the star (normal- 
ized by i?*) by d. As it was shown in iPall (|2008l ). d can 
be parametrized in a second order approximation as 



iAtf+b\ 



(126) 



where At is the time between the actual observation time 
and the intrinsic transit center. The intrinsic transit center 
is defined when the planet reaches its maximal tangential 
velocity during the transit. Although the tangential velocity 
cannot be measured directly, the intrinsic transit center is 
determined by purely the radial velocity data, without any 
knowledge of the transit geometry For eccentric orbits the 
impact parameter b is related to the orbital inclination i as 



= — cos I 

,RJ 1 + h 



(127) 



In order to have a better description of the transit light 
curve, we used a higher order expansion in the d{At) func- 
tion (Eq. I126p . For circular orbits, such an expansion is 
straightforward. To derive the expansion for elliptic or- 
bits, we employed the method of Lie-integration which gives 
the solution of any ordinary differential equation (here, the 
equations for the two-body problem) in a recursive series for 
the Taylor expansion with respect to the independent vari- 
able (here, the time). It can be shown involving the Taylor 
expansion of the orbital motion that the normalized pro- 
jected distance d up to fourth order is: 



where 



and 



1 - 2R<p - (0 - R'^)<p 



+ 



(1 - 6^)Ar 



1 2 1 3 



, (128) 



(129) 



''"^ In other words, predictions can only be made for the intrinsic 
transit center in cases where the planet was discovered by a radial 
velocity survey and initially we have no further constraint for the 
geometry of the system. 



R = 



1 + h 



(l_e2)3/: 



rk. 



(130) 



Here n = 2n/P is the mean motion, and </p is defined as 
(fi = nAt. For circular orbits, Q — 1 and R = 0, and for 
small eccentricities (e ^ 1), Q ~ 1 + 3h and R ~ k. The 
leading order correction term in (p, —2b^R(p, is related to 
the time lag between the photometric and intrinsic transit 
centers. The photometric transit center is defined halfway 
between the instants when the center of the planet crosses 
the limb of the star inward and outward. It is easy to show 
by solving the equation d{ip) — 1, yielding two solutions {ipi 
and v'e), that this phase lag is: 



A<p = 



b^R 



±7:) {i-b^)-{Q-R^W' 
b\ 



(131) 
(132) 

(133) 



which can result in a time lag of several minutes. 

In equation (|128|) . the third order terms in ip describe 
the asymmetry between the slopes of the ingress and egress 
parts of the ligh t curve. For s ome other aspects of light curve 
asymmetries see iLoebl (|2005h andU arne In the cases 

when no assumptions are known for the orbital eccentricity, 
we cannot treat the parameters R and Q as independent 
since the intrinsic transit center and R have an exception- 
ally high correlation. However, if we assume a simpler model 
function, with only third order terms in ip with fitted coef- 
ficients present, i.e. 



d' 



b^ 



1^ 3 



3^ 



(134) 



yields a non-zero value for the C coefficient for asymmetric 
light curves. In the case of HAT-P-2b, the derived values 
for Q and R are Q = 2.23 ± 0.10 and R = -0.789 ± 0.021 
(derived from the values of k and h, see Sec. l4.4.3)l . thus the 
coefficient for the third order term in ip is QR = — 1.75±0.13. 
Using equation (|134|l . for an "ideal" light curve (with simi- 
lar parameters of k, h, C/R* s-nd b^ as for HAT-P-2b), the 
best fit value for C is C = —2.23, which is close to the value 
of QR ~ —1.75. The difference between the best fit value of 
C and the fiducial value of QR is because in equation (|134|l 
the coefficients for the first and second order terms were 
fixed to be and 1, respectively. Although this asymmetry 
can be measured directly (without leading to any degen- 
eracy between the fit parameters), in practice we need ex- 
treme photometric precision to obtain a significant detection 
for a non-zero C parameter: assuming a photometric time 
series for a single transit of HAT-P-2b with 5 sec cadence 
where each individual measurement has a photometric error 
of 0.01mmag(!), the uncertainty in C is ±0.47, equivalent 
to a 5-a detection of the light curve asymmetry. This detec- 
tion would be hard for ground-based instrumentation (i.e. 
for a 1-CT detection one should achieve a photometric preci- 
sion of 0.05 mmag at the sam e cadence). Space missions like 
Kepler (|Borucki et al.ll2007l ) will be able to detect orbital 
eccentricity relying only on photometry of primary transits. 
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Table 12. Stellar parameters for HAT-P-2. The values of cf- 
fecitve temperature, metallicity and projected rotational velocity 
are based on purely spectroscopic data (SME) while the other 
ones are derived from the both the spectroscopy (SME) and the 
joint modcUing (LC+Y^). 



Parameter 


Value 


Source 


Toff (K) 


6290 ± 60 


SME^ 


[Fe/H] 


+0.14 ± 0.08 


SME 


t!sini (kms ^) 


20.8 ±0.3 


SME 


M. (Mq) 


1.34 ± 0.04 


Y^+LC+SME^' 


R* (Rq) 


1 60+°°3 


Y^+LC+SME 


log 3* (cgs) 


4.158 ± 0.031 


Y^+LC+SME 


L. (Lq) 


■^■"-0.3 


Y^+LC+SME 


My (mag) 


3.36 ± 0.12 


Y^+LC+SME 


Age (Gyr) 


2.7 ± 0.5 


Y^+LC+SME 


Distance (pc) 


118 ±8 


Y^+LC+SME 



4.4.3 Joint fit 

As it was discussed before, in order to achieve a self- 
consistent fit, we performed a simultaneous fit on all of 
the hght curve and radial velocity data. We have involved 
equation (|128|) . to model the hght curves, where the pa- 
rameters Q and R were derived from the actual values of k 
and h, using equations equation p29p and equation p3Up . 
To find the best-fit values for th e parameters we used 
the downhill simplex algorithm (see IPress et al.l Il992l ) and 
we used the method of refitting to synthetic data sets to 
get an a posteriori distribution for the adjusted values. 
The final results of the fit were T-us = 2453379.10281 ± 
0.00141, T+n = 2454612.83271 ± 0.00075, K = 958.9 ± 
13.9ms~S k = -0.5119 ± 0.0040, h = -0.0543 ± 0.0098, 
Rp/R-, = p = 0.0724 ± 0.0010, 6^ = 0.125 ± 0.073, 
C/iJ* = 12.090 ± 0.046 day- \ 7Kock = 318.4 ± 6.6ms"\ 
7Lick = 77.0 ± 30.4ms"\ 70hp = -19868.9 ± 9. 8ms"\ 
The uncertainties of the out-of-transit magnitudes were be- 
tween (6 ... 21) X 10~^ mag for the follow-up light curves and 
16 X 10"^ mag for the HATNet data. The fit resulted a nor- 
malized value of 0.995. As it is described in the following 
subsection, the resulted distribution has been used then as 
an input for the stellar evolution modelling. 



4.4-4 Stellar parameters 

The second step of the analysis was the derivation of the 
physical stellar parameters. Following the complete Monte- 
Carlo way of par ameter estimation, as it was described by 
IPal et all (|2008al ). we calculated the distribution of the stel- 
lar density, derived from the a/i?* values. To be more pre- 
cise, the density of the star is 



po 



So 
R.' 



(135) 



where both po and So are directly related to observable 
quantities, namely 



po = 



37r 
GP2 \R. 



3KV1 - e2 / a 



2PG sin i \ R 



(136) 
(137) 



In equation (|135p . the only unknown quantity is the radius 
of the star, which can be derived using a stellar evolution 
model, and it depends on a luminosity indicator (that is, in 
practice, the surface gravity or the density of the star), a 
color indicator (which is the T^s effective surface tempera- 
ture, given by the SME analysis) and the stellar composition 
(here [Fe/H]). Therefore, one can write 



i?* =i?*(p.,Teff,[Fe/H]). 



(138) 



Since both Teff and [Fe/H] are known from stellar atmo- 
spheric analysis, equation (|135|l and equation (|138|) have two 
unknowns, and thus this set of equations can be solved itera- 
tively. Note that in order to solve equation (|138p . supposing 
its parameters are known in advance, one has to use a certain 
stellar evolutionary model. Such models are available in tab- 
ulated form, therefore the solution of the equation requires 
the inversion of the interpolating function on the tabulated 
data. Thus, equation p38|l is only a symbolical notation for 
the algorithm which provides the solution. Moreover, if the 
star is evolved, the isochrones and/or evolutionary tracks 
for the stellar models intersect themselves, resulting an am- 
biguous solution (i.e. it is not a "function" any more). For 
HAT-P-2, however, the solution of equation (|138p is defi- 
nite since the host star is a main sequence star. To obtain 
the physical parameters (e.g. t he stellar r adius ), we used the 
stellar evolutionary models of lYi et"al] (|200lh . by interpo- 
lating the v alues of pi,, T^s and [Fe/ H] using the interpolator 
provided bv lDemargue et al.l (|2004l ). 

The procedure described above has been applied to all 
of the parameters in the input set, where the values of po 
have been derived from the values of a/i?* and the orbital 
period P using equation (|136|l . while the values for T^a and 
[Fe/H] have been drawn from Gaussian random variables 
with the mean and standard deviation of the first SME re- 
sults (Tcff = 6290 ± 110 K and [Fe/H] = 0.12 ± 0.08). This 
step resulted the a posteriori distribution of the physical 
stellar parameters, including the surface gravity. The value 
and uncertainty for the latter was log = 4.16±0.04 (CGS), 
which is slightly smaller than the value provided by the SME 
analysis. To reduce the uncertainties in Teff and [Fe/H], we 
repeated the SME modelling by fixing the value of log to 
the above. This second SME run resulted T^a = 6290 ± 60 K 
and [Fe/H] = 0.14 ± 0.08. Following, we updated the val- 
ues for the limb darkening parameters (7^^' — 0.1419, 
7;^"=^ = 0.3634, 7^ ' = 0.1752, and 7^^^' = 0.3707), and re- 
peated the simultaneous light curve and radial velocity fit. 
The results of this fit were then used to repeat the stellar 
evolution modelling, which yielded among other parameters 
logg* = 4.158 ± 0.031 (CGS). Since the value of logg* did 
not change significantly, we accepted these stellar parameter 
values as final ones. The stellar parameters are summarized 
in Table [12] and the light curve and radial velocity parame- 
ters are listed in the top two blocks of Table 1131 



4.4-5 Planetary parameters 

In the previous two steps of the analysis, we determined the 
light curve, radial velocity and stellar parameters. In order to 
get the planetary parameters, we combined the two Monte- 
Carlo data sets that yield their a posteriori distribution in 
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a consistent way. For example, the mass of the planet is 
calculated using 

2 



Mp 



27r KVI^ 
T Gsini 



(139) 



where the values for the period P, RV semi-amplitude K, 
eccentricity e, inclination i, and normalized semimajor axis 
a/Rt, were taken from the results of the light curve and RV 
fit while the values for i?* were taken from the respective 
points of the stellar parameter distribution. From the distri- 
bution of the planetary parameters, we obtained the mean 



values and uncertainties. We derived Mp — ^ 



for the planetary mass, Rp = 1.1231q o54 Rjup for the ra- 
dius while the correlation between these parameters were 
C{Mp,Rp) — 0.68. The planetary parameters are summa- 
rized in the lower block of Table 1131 

Due to the eccentric orbit and the lack of the knowledge 
of the heat redistribution of the incoming stellar flux, the 
surface temperature of the planet can be constrained with 
difficulties. Assuming complete heat redistribution, the sur- 
face temperature can be estimated by time averaging the 
incoming flux which varies as 1/r^ = a~^{l — e cos E)~^ due 
to the orbital eccentricity. The time average of 1/r^ is 



dt 



1 

2^ 



dM 
:^2(M)' 



(140) 



where M is the mean anomaly of the planet. Since r = a(l — 
e cos E) and dM = (1 — e cos E)dE, where E is the eccentric 
anomaly, the above integral can be calculated analytically 
and the result is 



1 



(141) 



Using this time averaged weight for the incoming flux, we de- 



rived Tp 



1525^ 



_3Q K. However, the planet surface temper- 
ature would be ~ 2975 K on the dayside during periastron 
and assuming no heat redistribution, while the equilibrium 
temperature would be only ~ 1190 K if the planet was al- 
ways at that of apastron. Thus, we conclude that the surface 
temperature can vary by a factor of ~ 3, depending on the 
actual atmospheric dynamics. 

4.4-6 Photometric parameters and the distance of the 
system 



The stellar evolution modelling (see Sec. I4.4.4|l also yields 
the absolute magnitudes and colors for the models for vari- 
ous photometric passbands. We compared the obtained col- 
ors and absolute magnitudes with other observations. First, 
the V — I color of the modelled star was compared with 
the observations. The TASS catalogue (jProege et al.1 120061 ) 
has magnitudes for this star, Vtass = 8.71 ± 0.04 and 
/tass = 8.16 ± 0.05, i.e. the observed color of the star is 
{V — /)tass = 0.55 ± 0.06. The stellar evolution modelling 
resulted a color of {V — I)yy = 0.552 ± 0.016, which is 
in perfect agreement with the observations. The absolute 
magnitude of the star in V band is Mv ~ 3.36 ± 0.12, 
also given by the stellar evolution models. This therefore 
yields a distance modulus of Vtass — Mv = 5.35 ± 0.13, 
which is equivalent to a distance of 117 ± 7pc, assuming 
no interstellar reddening. This distance value for the star is 



Table 13. Spectroscopic and light curve solutions for HAT-P-2, 
and inferred planet parameters, derived from the joint modelling 
of photometric, spectroscopic and radial velocity data. 



Parameter 


Value 


P (days) 


5.6334697 ± 0.0000074 


E (HJD-2,400,000) 


54,342.42616 ± 0.00064 


Ti4 (days)=' 


0.1790 ±0.0013 


Ti2 = T34 (days)^' 


0.0136 ±0.0012 


Rp/ R* 


0.0724 ±0.0010 



k = e cos ui 
h = esinti) 



a/R^ 
b 

i (deg) 



958.9 ± 13.9 
-0.5119 ±0.0040 
-0.0543 ± 0.0098 
0.5148 ± 0.0038 
186.1° ± 1.1° 
9 21+"'-^'' 

+0.087 
0.156 

87?2j;Q;g° 



0.354"* 



Mp (Mjup) 




(^Jup) 


-1 -,90+0.071 


C(Mp,_Rp) 


0.68 


Pp (g cm-3) 




a (AU) 


0.0686 ±0.0007 


log9p (cgs) 


4.23 ± 0.04 


Toff (K) 


1525l^g (see ^) 



^ T14: total transit duration, time between first to last contact; Ti^ — 
: ingress/egress time, time between first and second, or third and fourth 
contact. 

^ This effective temperature assumes uniform heat redistribution while 
the irradiance is averaged on the orbital revolution. See text for further 
details about the issue of the planetary surface temperature. 

placed right between the distance values f ound in the two 
differe nt av ailable Hippa r cos red uct ions of |Perrvma,n et all 
11993) and Ivan LeeuwenI (|2007al lbl): [Perrvman et al.1 (|l997l ) 
reports a parallax of 7.39 ± 0.88 mas, equ i valent to a dis- 
tance of 135 ± 18 pc while Ivan LeeuwenI (|2007al lbl) states 
a parallax of 10.14 ± 0.73 mas, equivalent to a distance of 
99 ± 7 pc. In the two panels of Fig. 1411 stellar evolutionary 
isochrones are shown for the metallicity of HAT-P-2, super- 
imposed by the effective temperature and various luminosity 
estimations based on both the above discussion (relying only 
on various Hipparcos distances and TASS apparent magni- 
tudes) and the constraints yielded by the stellar evolution 
modelling. The 2MASS magnitude of the star in J band is 
J2MASS = 7.796 ± 0.027 while the stellar evolution models 
yielded an absolute magnitude of Mj = 2.465±0.110. Thus, 
the distance modulus here is J2MASS — Mj = 5.33 ± 0.11, 
equivalent to a distance of 116 ± 6pc, confirming the dis- 
tance derived from the photometry taken from the TASS 
catalogue. 



4.5 Discussion 

We presented refined planetary, stellar and orbital parame- 
ters for the HAT-P-2(b) transiting extrasolar planetary sys- 
tem. Our improved analysis was based on numerous radial 
velocity data points, including both new measurements and 
data taken from the literature. We have also carried out high 
precision follow-up photometry. The refined parameters have 
uncertainties that are smaller by a factor of ~ 2 in the plan- 
etary parameters and a factor of ~ 3 — 4 in th e orbital pa- 
ramete rs than the previously reported values of lBakos et al.l 
(l2007bl ). We note that the density of the planet turned out 
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Figure 41. Stellar evolutionary isochrones from the Yonsei-Yale 
models, showing the isochrones for [Fe/H] = 0.14 stars, between 
0.5 and 5.5 Gyrs (with a cadence of 0.5 Gyrs). The stellar color is 
indicated by the effective temperature, while the left panel shows 
the luminosity using the absolute V magnitude Mv and the right 
panel uses the ratio a/R* as a luminosity indicator. In the left 
panel, the isochrones are overplotted by the 1-a and 2-cr con- 
fidence ellipsoids, defined by the effective temperature, and the 
absolute magnitude estimations from the TASS catalogue and the 
two Hipparcos reductions (older: upper ellipse, recent: lower el- 
lipse). The diamond indicates the Mv magnitude derived from 
our best fit stellar evolution models. On the right plot, the confi- 
dence ellipsoid for the effective temperature and a/R* is shown. 



to be s ignificantly smaller that the value by iBakos et al.l 
(|2007bl ). namely Pp = 7.6 ± l.lgcm~^, m oreover, the un- 
certainty reported bv lBakos et"al] (|2007bh was significantly 
larger. In our analysis we did not rely on the distance of the 
system, i.e. we did not use the absolute magnitude as a lu- 
minosity indicator. Instead, our stellar evolution modelling 
was based on the density of the star, an other luminosity in- 
dicator related to precise light curve and RV parameters. We 
have compared the estimated distance of the system (which 
was derived from the absolute magnitudes, known from the 



stellar modelling) with the Hipparcos distances. We found 
that our newly estimated distance falls between the two val- 
ues available from the different reductions of Hipparcos raw 
data. 

The improved orbital eccentricity and argument of peri- 
center allow us to estimate the time of the possible secondary 
transits. We found that secondary transits occur at the or- 
bital phase of (pscc = 0.1886 ± 0.0020, i.e. Iday 1 hour and 
30 minutes (± 16 minutes) after primary transit events. 

The ze ro ins olation planetary isochrones of 
iBaraffe et al] (|2003l ) give an expected radius of 
J?p,Baraffe03 = 1-02 ± 0.02 7?jup, that is slightly smaller 
th an the measu r ed ra dius of 1.121q o5 -Rjup. The work 
of iFortnev et all (|2007l ) takes into account not only the 
evolutionary age and the total mass of the planet but the 
incident stellar fiux and the mass of the planet's core. 
By scaling the semimajor axis of HAT-P-2b to one that 
yields the same incident flux from a solar-type star on a 
circular orbit, taking into account both the luminosity of 
the star and the correction for the orbital eccentricity given 
by equation (fT4T|) . we obtained a = 0.033 ± 0.003 AU. 
Using this scaled semimajor axis, the inter p olatio n based 
on the tables provided by iFortney et al.l (|2007l ) yields 
radii between -Rp,Portncy,o = 1.142 ± 0.003 -Rjup (core-less 
planets) and -Rp,Fortnoy,ioo ~ 1.111 ± 0.003 -Rjup (core- 
dominated planets, with a core of Mp^coro = 100 Me). 
Although these values agree nicely with our value of 
Rp — 1.1231q'o54 -^Jupi ^he relatively large uncertainty 
of Rp excludes any further c onclusion for the size of the 
planet 's core. Recent models of lBaraffe. Chabrier fc BarmanI 
HqoI) also give the radius of the planet as the function 
of evolutionary age, metal enrichment and an optional 
insolation for equivalent to scaled semimajor axis of 
a' = 0.045 AU. Using this latter insolation, their models 
yield -Rp,Baraffc08,o.02 = 1-055 ± 0.006 7?,iup (for metal poor, 
Z = 0.02 planets) and 7?p.Baraffo08,o.io = 1.008 ± 0.006 7?.jup 
(for more metal rich, Z = 0.10 planets). These values 
are slightly smaller than the actual radius of HAT- 
P-2b, however, the actual insolation of HAT-P-2b is 
roughly two times larger than the insolation implied 
by a' = 0.045 AU. Since the re specti ve planetary radii 
of iBaraffe. Chabrier fc BarmanI (|2008l ) for zero inso- 
lation give -Rp°Lraffc08,o.o2 = 1-009 ± 0.006 iijup and 
R„L 

araffc08,o.io ^ 0.975 ± 0.006 J?jup for the respective 
cases of Z = 0.02 and Z = 0.10 metal enrichment, an 
extrapolation for a two times larger insolation would put 
the expected planetary radius in the range of ~ 1.1 -Rjup . 
This is consistent with the models of iFortney et al.l (|2007l ) 
as well as with the me asurements. H o wever , as discussed 
earlier in the case of iFortnev et al. I (|2007h models, the 
uncertainty in Rp does not let us properly constrain the 
metal enrichment. 

HAT-P-2b will remain an interesting target, as a mem- 
ber of an emerging heavy-mass population. Further photo- 
metric measurements will refine the light curve parameters 
and therefore more precise stellar parameters can also be 
obtained. This will yield smaller uncertainties in the physi- 
cal planetary radius, thus some parameters of the planetary 
evolution models, such as the metal enrichment can be ob- 
tained more explicitly. Moreover, observations of secondary 
eclipses will reveal the planetary atmosphere temperature 
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which now is poorly constrained. Since the secondary eclipse 
occurs shortly after periastron passage, the temperature and 
therefore the contrast might be high enough to detect the 
occultation with a good signal-to-noise ratio. 



5 SUMMARY 

Transiting extrasolar planets are the only group among the 
extrasolar planets whose basic physical parameters, such as 
mass and radius can be determined without any ambiguity. 
Therefore, these planets provide a great opportunity to de- 
termine other properties, such as the characteristics of the 
planetary interior or their atmosphere. Recently, wide-field 
photometric surveys became the most prominent observa- 
tion techniques for detecting transiting planets and these 
surveys yielded several dozens of discoveries. Since such 
wide-field surveys yield massive amount of data which can- 
not be efficiently and consistently processed by the available 
existing software solutions, I started developing a new pack- 
age in order to overcome the related problems. The devel- 
opment of this package has been related to the Hungarian- 
made Automated Telescope Network (HATNet) project, one 
of the most successful initiatives searching for transiting ex- 
trasolar planets. 

The aims of my work were both implementing the algo- 
rithms related to the photometric reduction in a form of a 
standalone software package, as well as applying these pro- 
grams in the analysis of the HATNet data. Additionally, the 
photometric reduction is intended to work on data obtained 
by other facilities, typically Im-class telescopes (such as the 
48" telescope at Fred Lawrence Whipple Observatory or the 
Schmidt telescope at the Piszkesteto Mountain Station). 

Of course, both the confirmation of planetary candi- 
dates and the characterization of known objects require 
other types of technologies such as spectroscopy, radial ve- 
locity measurements and stellar evolution modelling. In or- 
der to perform a consistent determination of the planetary, 
orbital and stellar parameters of transiting exoplanetary sys- 
tems, my work also focused on to include these additional 
types of measurements and methods in the data analysis. 

In this PhD thesis I presented a new software package 
intended to perform photometric data reduction on massive 
amount of astronomical images. Existing software solutions 
do not provide a consistent framework for the reduction of 
images acquired by wide-field and undersampled instrumen- 
tation. During the development of the related algorithms 
and the implementation, I focused on the issues related to 
these problems in order to have a homogeneous reduction 
environment, ranging from the calibration of frames to the 
final light curve generation and analysis. This new package 
has been successfully applied in processing the images of the 
HATNet and led to the discovery and confirmation of almost 
a dozen of transiting extrasolar planets. 



ACKNOWLEDGMENTS 

First of all, I would like to thank my parents and family for 
their immutable support during the years of my studies and 
in my whole life. 

I would like to thank Caspar Bakos for inviting to the 



project and for the possibility to be a member of people 
working in the field of transiting extrasolar planets. I also 
thank my supervisor, Balint Erdi for the opportunity to be 
a PhD student at the Eotvos University and for his help 
in the proofreading. I am grateful to the hospitality of the 
Harvard-Smithsonian Center for Astrophysics, where this 
work has been partially carried out. I would like to thank 
Caspar, his wife Krisztina Meiszel and and other friends, 
Istvan Cziegler, Cabor Fiiresz, Bence Kocsis, Maria Peto 
and David Vegh for their continuous and great help related 
to the life in Cambridge and around Boston. 

I thank Brigitta Sipocz for her comments, ideas for 
improvements and bug reports related to the data anal- 
ysis programs. I also thank collaborators at the Harvard- 
Smithsonian CfA and the Konkoly Observatory, Joel Hart- 
man, Cabor Kovacs, Ceza Kovacs, Robert Noyes and 
Cuillermo Torres for their help. 

I would like to say thanks to my friends Balazs Dianiska, 
Agnes Kospal, Andras Laszlo and Judit Szulagyi for their 
encouragement and for their valuable comments on this dis- 
sertation. Like so, I thank Eric Agol, Daniel Fabrycky, Bence 
Kocsis and Joshua Winn for their help during the prepara- 
tion of various articles related to this field of science. I also 
thank Edward Miller, one of my former roommates for his 
help on the earlier versions of the draft. 

Last but not least, I would like to thank Peter Abraham, 
Ferenc Horvai and Csaba Kiss for their present support 
and the opportunity to continue the related research in the 
Konkoly Observatory. 



References 

Agol, E., Steffen, J., Sari, R., & Clarkson, W., 2005, MN- 
RAS, 359, 567 

Aigrain, S., Hodgkin, S., Irwin, J., Hebb, L., Irwin, M., 
Favata, F., Moraux, E., Pont, F., 2007, MNRAS, 375, 29 
Akerlof, C, et al.2000, AJ, 119, 1901 
Alard, C. & Lupton, R. H. 1998, ApJ, 503, 325 
Alard, C. 2000, A&AS, 144, 363 
Alonso, R., et al. 2004, ApJL, 613, L153 
Alonso, R., et al. 2008, A&A, 482, 21 

Bailes, M., Lyne, A. C. & Shemar, S. L. 1991, Nature, 352, 
L311 

Bakos, C. A. et al, 2002, PASP, 114, 974 
Bakos, C. A. et al., 2004, PASP, 116, 266 
Bakos, C. A. 2004, PhD Thesis 
Bakos, C. A. et al, 2007, ApJ, 656, 552 
Bakos, C. A. et al, 2007, ApJ, 670, 826 
Bakos, C. A. et al., 2007, ApJ, 671, L173 
Bakos, C. A. et al., 2009, ApJ, 696, 1950 
Bakos, C. A., Torres, C, Pal, A. et al. 2009, ApJ, submit- 
ted (|arXiv:0 901.0282) 
Baraffe, I. et al. 2003, A&A, 402, 701 

Baraffe, I., Chabrier, C. and Barman, T. 2008, A&A, 482, 
315 

Barbieri, M. et al. 2007, A&A, 476, L13 
Barge, P. et al., 2008, A&A, 482, 17 
Barnes, J. W., 2007, PASP, 119, 986 
Baluev, R. V., 2008, MNRAS, 389, 1375 
Beatty, T. C. et al., 2007, ApJ, 663, 573 
Berlin, E. & Arnouts, S. 1996, A&AS, 117, 393 



© 0000 RAS, MNRAS 000, 000-000 



Tools for discovering and characterizing extrasolar planets 67 



Borucki, W. J. et al., 2007, ASP Conf. Ser., 366, 309 

Bouchy, F. et al., 2005, A&A, 444, 15 

Bramich, D. M. et al., 2005, MNRAS, 359, 1096 

Bramich, D. M. 2008, MNRAS, 386, 77 

Brown, T. M., Charbonneau, D., Gilliland, R. L., Noyes, 

R. W. & Burrows, A., 2001, ApJ, 552, 699 
Brown, T. M., 2003, ApJL, 593, L125 

Burke, Christopher J., Gaudi, B. Scott, DePoy, D. L., 
Pogge, Richard W., Pinsonneault, Marc H., 2004, AJ, 127, 
2382 

Burrows, A., Hubeny, I., Budaj, J., & Hubbard, W. B. 

2007, ApJ, 661, 502 
Butler, R. P., Marcy, G. W., Wilhams, E., McCarthy, C, 

Dosanjh, P. & Vogt, S. S., 1996, PASP, 108, 500 
Butler, R. P., Marcy, G. W., Fischer, D. A., Brown, T. M., 

Contos, A. R., Korzennik, S. G., Nisenson, P., Noyes, R. 

W. 1999, ApJ, 526, 916 
Butler, R. P., Marcy, G. W. 1996, ApJL, 464, 153 
Butler, R. P. et al., 2004, ApJ, 617, 580 
Butler, R. P., Wright, J., Marcy, G., Fischer, D., Vogt, S., 

Tinney, C, Jones, H., Carter, B., Johnson, J, McCarthy 

& C. Penny, A., 2006, ApJ, 646, 505 
Calabretta, M. R., Greisen, E. W. 2002, A&A, 395, 1077 
Cameron, A. C. et al. 2006, MNRAS, 373, 799 
Cameron, A. C. et al. 2007, MNRAS, 375, 951 
Carpenter, J. 2001, AJ, 121, 2851 

Carter, J. A., Yee, J. C, Eastman, J., Gaudi, B. S. & Winn, 

J. N. 2008, ApJ, 689, 499 
Charbonneau, D., Brown, T. M., Latham, D. W. & Major, 

M., 2000, ApJ, 529, 45 
Charbonneau, D., Brown, T. M., Burrows, A. & Laugh- 

lin, G. 2006, in Conference Proceedings of Protostars and 

Planets V, B. Reipurth, D. Jewitt, and K. Keil (eds.). 

University of Arizona Press, Tucson, 701 
Claret, A. 2000, A&A, 363, 1081 
Claret, A. 2004, A&A, 428, 1001 

Demarque, P., Woo, J.-H., Kim, Y.-C. & Yi, S. K. 2004, 
ApJ, 155, 667 

D'Angelo, G., Lubow, S. H. & Bate, M. R., 2006, ApJ, 652, 
1698 

Droege, T. F., Richmond, M. W., & Sallman, M., 2006, 

PASP, 118, 1666 
Fabrycky, D. & Tremaine, S. 2007, ApJ, 669, 1298 
Fabrycky, D., 2008, ApJ, 677, L117 
Finn, L. S., 1992, Phys. Rev. D, 46, 5236 
Fischer, D. A., Marcy, G. W., Butler, R. P., Laughlin, G., 

Vogt, S. S. 2002, ApJ, 564, 1028 
Fischer, D. A. et al., 2007, ApJ, 669, 1336 
Ford, E., 2004, AJ, 129, 1706 
Ford, E. B. & Holman, M. J., 2007, ApJ, 664, 51 
Ford, E. 2008, AJ, 135, 1008 
Ford, E. B., Rasio, F. A. 2007, ApJ, 686, 621 
Fortney, J. J., Marley, M. S. & Barnes, W. 2007, ApJ, 659, 

1661 

Fressin, F., Guillot, T., Morello, V., & Pont, F. 2007, A&A, 
475, 729 

Gaudi, B. S. 2005, ApJ, 628, L73 

Gillon, M. et al. 2007, A&A, 472, L13 

Gionis, A.: Computational Geometry: Nearest Neighbor 



Problem (lecture notes) 
Groth, E. J. 1986, AJ, 91, 1244 

Guillot, T., Santos, N. C, Pont, F., Iro, N., Melo, C, & 

Ribas, I. 2006, A&A, 453, L21 
Hansen, B., & Barman, T. 2007, ApJ, 671, 861 
Hartman, J. D., Bakos, G. A., Stanek, K. Z., Noyes, R. W. 

2004, AJ, 128, 1761 
Hartman, J. D., Gaudi, B. S., Holman, M. J., McLeod, B. 

A., Stanek, K. Z., Barranco, J. A., Pinsonneault, M. H., 

Meibom, S., Kalirai, J. S., 2008, ApJ, 675, 1233 
Hebrard, G., et al. 2008, A&A, 488, 763 
Henry, G. W., Marcy, G. W., Butler, R. P. & Vogt, S. S. 

2000, ApJ, 529, 41 
Holman, M. J. et al. 2007, ApJ, 664, 1185 
HoweU, S. B. 1989, PASP, 101, 616 
Hut, P., 1981, A&A, 99, 126 

Irwin, Jonathan, Aigrain, Suzanne, Hodgkin, Simon, Ir- 
win, Mike, Bouvier, Jerome, Clarke, Cathie, Hebb, Leslie, 
Moraux, Estelle, 2006, MNRAS, 370, 954 
Israel, H., Hessman, F. V. & Schuh, S. 2007, AN, 328, 16 
Johns-KruU, C. M. et al., 2008, ApJ, 677, 657 
Jordan, A. & Bakos, G. A., 2008, ApJ, 685, 543 
Joye, W. A. & Mandel, E., 2003, Astronomical Data Anal- 
ysis Software and Systems XII, ASP Conference Series, 
Vol. 295, 489 (eds. H. E. Payne, R. I. Jedrzejewski, and 
R. N. Hook) 

Kim, D.-W., Protopapas, P. Alcock, Ch., Byun, Y.-I. & 
Bianco, F 2008, MNRAS, accepted l|arXiv:0812.1010|) 

Kochanski, G. P., Tyson, J. A. & Fischer, P 1996, AJ, 111, 
1444 

Konacki, M., Torres, G., Jha, S., Sasselov, D. D., 2003, 

Nature, 421, 507 
Kovacs, G., Zucker, S., & Mazeh, 2002, A&A, 391, 369 
Kovacs, G., Bakos, G. A., & Noyes, R. W., 2005, MNRAS, 

356, 557 

Kovacs, G. et al., 2007, ApJ, 670, L41 
Landolt, A. U. 1992, AJ, 104, 340 

Latham, D. W. 1992, in lAU CoU. 135, Complementary 
Approaches to Double and Multiple Star Research, ASP 
Conf. Ser. 32, eds. H. A. McAlister & W. I. Hartkopf (San 
Francisco: ASP), 110 
Latham, D. W. et al. 2008, ApJ, submitted 

(|arXiv:0812.1161l> 
Loeb, A., 2005, ApJL, 623 L45 
Loeillet, B. et al., 2008, A&A, 481, 529 
Lupton, R. 2007, ASP Conf. Ser., 371, 160 
Lyne, A. G., Bailes, M. 1992, Nature, 355, L213 
Mandel, K., Agol, E., 2002, ApJ, 580, L171 
Mandushev, G. et al., 2005, ApJ, 621, 1061 
Marcy, G. W. & Butler, R. P., 1992, PASP, 104, 270 
McCuUough, P. R. et al., 2005, PASP, 117, 783 
McCuUough, P. R. et al., 2006, ApJ, 648, 1228 
Mayor, M., Queloz, D., 1995, Nature, 378, 355 
Mink, D. J., 2002, ASP Conf. Ser, 281, 169 
Mochejska, B. J., Stanek, K. Z., Sasselov, D. D., Szentgy- 

orgyi, A. H. 2002, AJ, 123, 3460 
Mochejska, B. J., Stanek, K. Z., Sasselov, D. D., Szent- 
gyorgyi, A. H., Adams, E., Cooper, R. L., Foster, J. B., 



5* http://thcory.stanford.cdu/"'nmishra/CS361-2002/lecturcl2- 
scribc.pdf 



© 0000 RAS, MNRAS 000, 000-000 



68 A. Pal 



Hartman, J. D., Hickox, R. C, Lai, K., Westover, M., 
Winn, J. N. 2006, AJ, 131, 1090 
Moutou, C. et al. 2009, A&A, 498, 5 

Murray, CD. and Dermott, S. F., 1999, Solar System Dy- 
namics, Cambridge Univ. Press, Cambridge 
Naef, D. et al. 2001, A&A, 375, 27 
Noyes, R. W. et al., 2008, ApJ, 673, L79 
O'Donovan, F. T. et al., 2007, ApJ, 662, 658 
Ochsenbein, F., Bauer, R, & Marcout, J. 2000, A&A, 143, 
23 

Pal, A., Bakos, G. A., 2006, RASP, 118, 1474 
Pal, A. et al., 2008, ApJ, 680, 1450 
Pal, A., 2008, MNRAS, 390, 281 

Pal, A., Bakos, G. A., Noyes, R. W., Torres, G., 2008, lAU 

Symposium, Volume 253, 428 
Pal, A. et al. 2008, ApJ, accepted (arXiv:0810.0260J) 
Pal, A., 2008, MNRAS, accepted (arXiv:0904.0324!) 
Pepper, J., Gould, A. & DePoy, D. L. 2004, AIR Conf. 

Proc. 713: The Search for Other Worlds, 713, 185 
Pepper, J. et al., 2007, RASP, 119, 923 
Perryman, M. A. C., et al., 1997, A&A, 323, 49 
Phillips, A. C., Davis, L. E. 1995, in Astronomical Data 

Analysis Software and Systems IV, ASP Conference Series 

Vol. 77, 297 (eds. R. A. Shaw, H. E. Payne and J. J. E. 

Hayes) 

Pojmanski, G. 1997, Acta Astronomica, 47, 467 

PoUacco, D. et al., 2006, Ap&SS, 304, 253 

Press, W. H., Teukolsky, S. A., Vetterling, W.T., Flannery, 

B. P., 1992, Numerical Recipes in C: the art of scientific 
computing. Second Edition, Cambridge University Press 

Queloz, D. et al., 2001, A&A, 379, 279 

Rauer, H., Eislffel, J., Erikson, A., Guenther, E., Hatzes, 
A. P., Michaelis, H. & Voss, H. 2004, RASP, 116, 38 

Sahu, K. C, 2006, Nature, 443, 534 

Santos, N. C. et al. 2002, A&A, 392, 215 

Shewchuk, R. J. 1996, in Applied Computational Geome- 
try: Towards Geometric Engineering, ed. M. C. Lin & D. 
Manocha (Berlin: Springer), 1148, 203 

Seager, S., Malln-Ornelas, G. 2003, ApJ, 585, 1038 

Shporer, A., Mazeh, T., Moran, A., Bakos, G. A., Kovacs, 
G. & Mashal, E., 2006, Tenth Anniversary of 51 Peg- 
b: Status of and prospects for hot Jupiter studies, eds. 
L. Arnold, F. Bouchy and C. Moutou (Frontier Group, 
Paris), 196 

Shporer, A. et al., 2009, ApJ, 690, 1393 

Shupe, D. L., Moshir, Mehdrdad, Li, J., Makovoz, D., Nar- 
ron, R., Hook, R. N., 2005, ASP Conf. Ser., 347, 491^^ 

Simon, A., Szatmary, K. & Szabo, Gy. M., 2007, A&A, 470, 
727 

Skrutskie, M. F., Cutri, R. M., Stiening, R., Weinberg, M. 
D., Schneider, S., Carpenter, J. M., Beichman, C, Capps, 
R., Chester, T., Elias, J., Huchra, J., Liebert, J., Lonsdale, 

C, Monet, D. G., Price, S., Seitzer, P., Jarrett, T., Kirk- 
patrick, J. D., Gizis, J., Howard, E., Evans, T., Fowler, J., 
Fullmer, L., Hurt, R., Light, R., Kopan, E. L., Marsh, K. 
A.,, McCallon, H. L., Tam, R., Van Dyk, S., and Whee- 
lock, S., 2006, AJ, 131, 1163 

Snellen, I. A. G. et al. 2008, A&A, 497, 545 



http: / /spider. ipac.caltech.edu/stafF/shupe/distortion_vl.O. htm 



Southworth, J., Wheatley, P. J., & Sams, G. 2007, MNRAS, 
379, 11 

Sozzetti, A. et al., 2007, ApJ, 664, 1190 

Steffen, J. H. & Agol, E., 2007, ASP Conf. Ser, 366, 158 

Stetson, P. B., 1987, RASP, 99, 191 

Stetson, P. B. 1989, in Advanced School of Astrophysics, 
Image and Data Processing/Interstellar Dust, ed. B. Bar- 
bury, E. Janot-Pacheco, A. M. Magalhaes and S. M. Vie- 
gas (Sao Paulo, Instituto Astronomico e Geofi'sico) 

Stiefel, E. L., Scheifele, G., 1971, Linear and regular ce- 
lestial mechanics, perturbed two-body motion, numerical 
methods, canonical theory, Berlin, New York, Springer- 
Verlag 

Street, R. A. et al., 2003, ASP Conf. Ser. 294, 405 
Szulagyi, J., Kovacs, G. & Welch, D. L. 2009, A&A, ac- 
cepted farXiv:0903.1165) 
Takeda, G., Kita, R. & Rasio, F. A. 2008, ApJ, 683, 1063 
Tamuz, O., Mazeh, T. & Zucker, S. 2005, MNRAS, 356, 
1466 

Thiebaut, C, Boer, M. 2001, ASP Conf. Ser., 238, 388 
Tomaney, A. B., Crotts, A. P. S. 1996, AJ, 112, 2872 
Torres, G., Boden, A. F., Latham, D. W., Pan, M. & Ste- 

fanik, R. P., 2002, AJ, 124, 1716 
Torres, G., Konacki, M., Sasselov, D. D., & Jha, S., 2005, 

ApJ, 619, 558 
Torres, G. et al., 2007, ApJ, 666, L121 
Torres, G., Winn, J. N., Holman, M. J., 2008, ApJ, 677, 

1324 

Udalski, A., Szymanski, M., Kaluzny, J., Kubiak, M., 
Krzeminski, W., Mateo, M., Preston, G. W., Paczynski, 
B. 1993, AcA, 43, 289 
Udalski, A., Paczynski, B., Zebrun, K., Szymanski, M., Ku- 
biak, M., Soszynski, I., Szewczyk, O., Wyrzykowski, L., 
Pietrzynski, G. 2002, AcA, 52, 1 
Valdes, F. G., Campusano, L. E., Velasquez, J. D., Stetson, 

P. B. 1995, RASP, 107, 1119 
Valenti, J. A. & Piskunov, N., 1996, A&AS, 118, 595 
Valenti, J. A. & Fischer, D. A., 2005, ApJS, 159, 141 
van Leeuwen, F., 2007, Hipparcos, the New Reduction of 
the Raw Data, Astrophysics and Space Science Library, 
Vol. 250 

van Leeuwen, F., 2007, A&A, 474, 653 

Vogt, S. S., 1987, RASP, 99, 1214 

Vogt, S. S. et al., 1994, Proc. SPIE, 2198, 362 

Weldrake, David T. F., Bayliss, Daniel D. R., Sackett, 

Penny D., Tingley, Brandon W., Gillon, Michal, Setiawan, 

Johny 2008, ApJ, 675, 37 
Winn, J. N. et al. 2007, ApJ, 665, 167 
Winn, J. N. et al. 2007, ApJ, 134, 1707 
Wolszczan, A., Frail, D. A., 1992, Nature, 355, 145 
Wright, J. T. 2005, RASP, 117, 657 
Yi, S. K. et al., 2001, ApJS, 136, 417 
Yuan, F. & Akerlof, C. W. 2008, ApJ, 677, 808 
Young, A. T., 1967, AJ, 72, 747 



© 0000 RAS, MNRAS 000, 000-000 



